microsoft fabric
8 TopicsWhat's new in SQL Server 2025
Add deep AI integration with built-in vector search and DiskANN optimizations, plus native support for large object JSON and new Change Event Streaming for live data updates. Join and analyze data faster with the Lakehouse shortcuts in Microsoft Fabric that unify multiple databases — across different SQL Server versions, clouds, and on-prem — into a single, logical schema without moving data. Build intelligent apps, automate workflows, and unlock rich insights with Copilot and the unified Microsoft data platform, including seamless Microsoft Fabric integration, all while leveraging your existing SQL skills and infrastructure. Bob Ward, lead SQL engineer, joins Jeremy Chapman to share how the latest SQL Server 2025 innovations simplify building complex, high-performance workloads with less effort. Run natural language semantic search directly in SQL Server 2025. Vector search and DiskANN work efficiently on modest hardware — no GPU needed. Get started. Run NoSQL in SQL. Store and manage large JSON documents directly in SQL Server 2025. Insert, update, and query JSON data with native tools. Check it out. Avoid delays. Reduce database locking without code changes to keep your apps running smoothly. See the new Optimized Locking in SQL Server 2025. QUICK LINKS: 00:00 — Updates to SQL Server 2025 00:58 — Search and AI 03:55 — Native JSON Support 06:41 — Real-Time Change Event Streaming 08:40 — Optimized Locking for Better Concurrency 10:33 — Join SQL Server data with Fabric 13:53 — Wrap up Link References Start using SQL Server 2025 at https://aka.ms/GetSQLServer2025 Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: - Today we’ll look at the AI integration developer updates and performance improvements that make SQL Server 2025 a major upgrade. We’ve got a lot to unpack here, so we’re going to waste no time and get straight into this with lead SQL engineer, Bob Ward. Welcome back to the show. - So great to be back. - So SQL Server 2025, it’s brand new. It’s in public preview right now. So what’s behind the release and what’s new? - There are three major areas of updates that we focus on in this release. First, we have deep AI integration. For example, we now have built-in vector search support for more accurate and efficient data retrieval with some under the hood optimizations using DiskANN. Second, if you’re a developer, this is the most significant release of SQL in the last decade. You know, some of the highlights are native support for JSON files and new change event streaming capabilities for real-time updates. And the third area is improved analytics, where we’re going to make it easy to mirror your SQL Servers into Microsoft Fabric without moving the data. - And all of these are very significant updates. So why don’t we start with what’s new in search and AI? - Great, let’s get going. As I’ve mentioned, we’ve integrated AI directly into the database engine to give you smarter, intelligent searching. With vector search capabilities built-in, you can do semantic search over your data to find matches based on similarity versus keywords. For example, here I have a database with a table called ProductDescription, and I want to search using SQL queries against the Description table for intelligent search. Typically, you’d use full text search for this. Now I’ve built this out, but what about these natural language phrases, Will they work? They don’t. And even when I use like clauses, as you can see here, or contains, or even freetext, none of these methods returns what I’m looking for. Instead, this is where natural language with vector search in SQL Server 2025 shines. As a developer, I can get started even locally on my laptop, no GPU required. I’m using the popular framework, Ollama, to host a free open-source embeddings model from Hugging Face. This will convert our data into vectors, including query prompts, and I declare it using this CREATE EXTERNAL MODEL statement. Then I’m able to go in and build a table using the new built-in vector type to store what’s called embeddings in a binary format. My table has keys pointing back to my description data and then I can use a built-in T-SQL function to generate embeddings based on Ollama and store them. For vector search to work, I need to create a vector index, and it’s also performance optimized using Disk approximate nearest neighbor, or DiskANN, which is a new way to offload what you’d normally want to run completely in memory to point to an index stored on disk. I have a stored procedure to convert the query prompts into embeddings so it can be used to find matching embeddings in the vector index. So now I have everything running locally on my laptop running SQL. Let’s see how it works. I’ll try this natural language prompt, like I showed earlier. And it worked. I get a rich set of results, with matching information based on my search to find products in the database. And I can even use Copilot from here to explore more about SQL data. I’ll prompt it to look for my new table. And you can see, response here, finding our new table. And I can ask it to pull up a few embedding values with product names and descriptions. And as you saw the result using our open source embeddings returned a few languages back. And the good news is that if your data contains multiple languages, it’s easy to use different embedding models. For example, here I’ve wired up Azure OpenAI’s ADA 2 embeddings model optimized for multiple languages without even changing my code. And now I can even search using Mandarin Chinese and get back matching results. - And DiskANN and vector-based search are both massive updates that really go hand in hand to enable better natural language querying on modest hardware. So what about all the developer updates? - With these updates, things get so much more efficient for developers. With JSON file types, you can bring NoSQL into your SQL relational database. Let me show you how. I’ve created a database called Orders and a table called Orders. Notice here the new JSON data type, which can store up to a massive two gigabytes of JSON document in this native data type. Now let’s look at a couple of examples. First, I can easily insert JSON documents in their native format directly into the table, and I’ll show you some of the JSON functions that you can do to process this new JSON type. JSON value will pull a particular value out of a JSON document and bring it back in result set format. And I can just dump out all the JSON values, so each document will appear as a separate row in their native JSON format. But instead of just doing that, I have aggregate functions. This takes all the rows of JSON types in the table and produces a single array with a single JSON document with all the new rows in the native JSON type. Key-value pairs are also popular in JSON, and I can use the new OBJECT AGGREGATE function to take the order ID key and the JSON document and produce a set of key-value pairs. And I can modify the JSON type directly from here too. Notice, for order_id 1, the quantity is also 1. I’ll run this update to modify the value. And when it’s finished, the order_id, quantity has been updated with the value of 2 directly in the JSON. Now that’s a good example of using the JSON type. So let me show you how this works with a JSON index. I’ve got a different database for contacts, along with the table for contacts using a JSON document as one of the properties of the contacts table. I can create a JSON index on top of that JSON document, like this. Now I’ve got some sample data that are JSON documents. And in a second, I’m going to push those into our database. And as I scroll, you’ll that this has nested tags as properties in the JSON document. Now I’ll run the query so I can insert these rows with the names of each tag. Let’s go look at the output. I’m using JSON value for the name, but I’m using JSON query because the tags are nested. Now I’ll show you an example searching with the JSON index. I’m using the new JSON contains function to find tags called fitness that are deep nested in the JSON document. And I can run that and find the right tags and even the execution plan. You can see here that it shows we’re using the new JSON index to help go find that information. - That’s a big deal. And like you said, there’s a lot happening natively in JSON, and now you’ve got the benefits of SQL for joins, and security, and a lot more, - You know, and for developers who use change data capture, things become a lot easier with change event streaming. Here, we’re reducing I/O overhead and sending transaction log changes directly to your application. To get started with change event streaming for our orders database, I’ll run the stored procedure to enable streaming for the database. You can see the table we’re going to use to track changes is a typical type of orders table. Here I’ve created what’s called an event stream group. This is where I’ve configured event streaming to tell it the location of our Azure event hub to stream our data, and I’ve added my credentials. Then I’ve configured the table orders to be part of the event streaming group. I’ve run these procedures to make sure that my configuration is correct. So let’s do something interesting. I’m going to automate a workflow using agents to listen for changes as they come in and try to resolve any issues. First, I’ve created an Azure function app, and using my function app, I have an agent running in the Azure AI service called ContosoShippingAgent. It’s built to take shipment information, analyze it, and decide whether something can be done to help. For example, resolving a shipping delay. I’ve started my Azure function. This function is waiting for events to be sent to Azure Event Hub in order to process them. Now, in SQL, I’ll insert a new order. Going back over to my Azure function, you’ll see how the event is processed. In the code, first, we’re dumping up the raw cloud event that I showed earlier. Notice the operation is an insert. It’s going to dump out some of the different fields we’ve parsed out of the data, the column names, the metadata, and then the row itself. Notice that because the shipment is 75 days greater than our sales date, it will call our agent. The agent then comes back with a response. It looked at the tracking details and determined that it can change the shipping provider to expedite our delayed shipment, and it contacted the customer directly with the updating shipping info. - And everybody likes faster shipping. So speaking of things that are getting faster, it’s kind of a tradition on Mechanics that we cover the speed ups for SQL Server. So what are the speed ups and the performance optimizations for ‘25? - Well, there’s a lot, but my favorite one improves application concurrency. We’ve improved the internals of how locking works without application code changes. And I’ve got an example of this running. I have a lock escalation problem that I need to resolve. I’m going to go update about 2,500 rows in this table just to show what happens, then how we’ve solved for it. So running this query against that Dynamic Management View, or DMV, shows locks that have accumulated, about 2,500 locks here for key-value locks and 111 for page locks. So what happens if I run enough updates against the table that would cause a lock escalation? Here, I’ll update 10,000 rows in the system. But you can see with the locks that this has been escalated to an object lock. It’s not updating the entire table, but it’s going to cause a problem. Because I’ve got a query over here that can update the maximum value in just one row and it’s going to get blocked, but it shouldn’t have to be. You can see here from the blocking query that’s running that it’s blocked on that original session, and I’m not actually updating a row that’s affected by the first one. This is the problem with lock escalation. Now let’s look at a new option called optimized locking in SQL Server 2025. Okay, let’s go back to where I updated 10,000 rows and look at the lock. Notice how in this particular case I have a transaction lock. It’s an intent exclusive lock for the table, but only a transaction lock for that update. If I use this query to update the max, you’ll see that we are not blocked. And by looking at the locks, each item has specific transaction locks, so we’re not blocking each other. And related to this, we’ve also solved another problem where two unrelated updates can get blocked. We call this lock after qualification. - Okay, so it’s pinpointing the exact lock type, so you’ll get less locks in the end. So why don’t we move on though from locks to joins? - Sure. With Microsoft Fabric, it’s amazing. You can pull in multiple databases, multiple data types into a unified data platform. Imagine you have two different SQL Servers in different clouds and on-prem, and you just want to join this data together in an easy way without migrating it. With Fabric, you can. I have a SQL Server 2022 instance with a database, and we’ve already mirrored the product tables from that database into Fabric. I’ll show you the mirroring configuration process for a SQL Server 2025 instance with different, but related tables. These are similar to the steps from mirroring any SQL Server. I’ve created a database connection for SQL Server 2025. Now I’ll pick all the tables in our database and connect. I’ll leave the name as is, AdventureWorks, and we’re ready to mirror our database. You can see now that the replication process has started for all the tables. All the rows have been replicated for all the columns on all the tables in my database and they’ve been mirrored into Fabric. Now let’s query the data using the SQL analytic endpoint. And you can see that the tables that we have previously had in our database and SQL Server are now mirrored into OneLake. Let’s run a query and I’ll use Copilot to do that. Here’s the Copilot code with explanations. Now I’ll run it. And as it completes, there’s our top customers buy sales. Now what if we wanted to do a join across the other SQL server? It’s possible. But normally, there are a lot of manual pieces to do this. Fabric can make that easier using a lakehouse. So let’s create a new lakehouse. I just didn’t to give it a name, AdventureWorks, and confirm. Now notice there are no tables in this lakehouse yet, so let’s add some. And for that, I’ll use a shortcut. A shortcut uses items in OneLake, like the SQL Server databases we just mirrored. So I’ll add the AdventureWorks database. And scrolling down, I’ll pick all the tables I want. Now I’ll create it. And we’re not storing the data separately in the lakehouse. It’s just a shortcut, like an active read link to the source data, which is our mirrored database, and therefore something that already exists in OneLake. And now you can see I’ve got these objects here. This icon means that these are shortcut from another table. So now, let’s get data from another warehouse. The SQL Server 2022 instance, which was ADW_products. Again, here, I’ll pick the tables that I want and Create. That’s it. So I can go and look at product to make sure I’ve got my product data. Now, let’s try to query this as one database and use another analytic endpoint directly against the lakehouse itself. So basically it thinks all the tables are just part of the unified schema now. Let’s open up Copilot and write a prompt to pull my top customers by products and sales. And it will be able to work directly against all of these connected databases because they are in just the same schema. And there you go. I have a list of all the data I need in one logical database. - And this is really great. And I know now that everything’s in OneLake, there’s also a lot more that you can do with that data. - With the lakehouse, the sky’s the limit. You can use Power BI, or any of those services that are in the unified data platform, Microsoft Fabric. - Okay, so now we’ve seen all the updates with SQL Server 2025. To everyone watching, what’s the best thing they can do to get started? - Well, the first thing is to start using it. SQL Server 2025 is ready for you to download and install it on the platform of your choice. You’ll find it at aka.ms/GetSQLServer2025. - So thanks so much for sharing all the updates, Bob, and thank you for joining us today. Be sure to subscribe for more, and we’ll see again soon.520Views0likes0CommentsData security controls in OneLake
Unify and secure your data — no matter where it lives — without sacrificing control using OneLake security, part of Microsoft Fabric. With granular permissions down to the row, column, and table level, you can confidently manage access across engines like Power BI, Spark, and T-SQL, all from one place. Discover, label, and govern your data with clarity using the integrated OneLake catalog that surfaces the right items fast. Aaron Merrill, Microsoft Fabric Principal Program Manager, shows how you can stay in control, from security to discoverability — owning, sharing, and protecting data on your terms. Protect sensitive information at scale. Set precise data access rules — down to individual rows. Check out OneLake security in Microsoft Fabric. No data duplication needed. Hide sensitive columns while still allowing access to relevant data. See it here with OneLake security. Built-in compliance insights. Streamline discovery, governance, and sharing. Get started with the OneLake catalog. QUICK LINKS: 00:00 — OneLake & Microsoft Fabric core concepts 01:28 — Table level security 02:11 — Column level security 03:06 — Power BI report 03:28 — Row level security 04:23 — Data classification options 05:19 — OneLake catalog 06:22 — View and manage data 06:48 — Governance 07:36 — Microsoft Fabric integration 07:59 — Wrap up Link References Check out our blog at https://aka.ms/OneLakeSecurity Sign up for a 60-day free trial at https://fabric.microsoft.com Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: -As you build AI and analytic workloads, unifying your data from wherever it lives and making it accessible doesn’t have to come at the cost of security. In fact, today we dive deeper into Microsoft’s approach to data unification, accessibility, and security with OneLake, part of Microsoft Fabric, where we’ll focus on OneLake’s security control set and how it compliments data discovery via the new OneLake catalog. -Now, in case you’re new to OneLake and Microsoft Fabric, I’ll start by explaining a few core concepts. OneLake is the logical multi-cloud data lake that is foundational to Microsoft Fabric, Microsoft’s fully managed data analytics and AI platform. OneLake, with its support for open data formats, provides a single and unified place across your entire company for data to be discovered, accessed, and controlled across your data estate. Data can reside anywhere, and you can connect to it using shortcuts or via mirroring. And once in OneLake, you have a single place where data can be centrally classified and labeled as the basis for policy controls. You can then configure granular, role-based permissions that can apply down to the folder level for unstructured data and by table for structured data. -Then all the way down to the column and row levels within each table. This way, security is enforced across all connected data. Meaning that whether you’re accessing the data through Spark, Power BI, T-SQL, or any other engine, it’s protected and you have the controls to allow or limit access to data on your terms. In fact, let me show you a few examples for enforcing OneLake security at all of these levels. I’ll start with an example showing OneLake security at the table level. I want to grant our suppliers team access to a specific table in this lakehouse. I’ll create a OneLake security role to do that. So I’ll just give it a name, SuppliersReaders. Then I’ll choose selected data and find the table that I want to share by expanding the table list, pick suppliers and then confirm. -Now, I just need to assign the right users. I’ll just add Mona in this case, and create the role. Then if I move over to Mona’s experience, I can run queries against the supplier data in the SQL endpoint. But if I try to query any other table, I’m blocked, as you can see here. Now, let me show you another option. This time, I’ll lock access down to the column level. I want to grant our customer relations team access to the data they need, but I don’t want to give them access to PII data. Using OneLake security controls, I can create a role that restricts access to sensitive columns. Like before, I’ll name it. Then I need to select my data. This time, I’ll choose three different tables for customer and order data. But notice this grayed out legacy orders table here that we would like to apply column security to as well. I don’t own the permissions for this table because it’s a shortcut to other data. However, the owner of that data can grant permission to it using the steps I’ll show next. From the role I just created, I’ll expand on my tables. And for the customer’s table, I’ll enable column security. Once I confirm, I can select the columns I want to remove and that we don’t want them to see and save it. -Now, let’s look at the results of this from another engine, Power BI, while building a report. I’ll choose a semantic model for my Power BI report. With the column level security in place, notice the sensitive columns I removed before, contact name and address, are hidden from me. And when I expand the legacy orders table, which was a shortcut, it’s also not showing PII columns. Now, some scenarios require that security controls are applied where records might be interspersed with the same table, so a row level filter is needed. For example, our US-based HR team should only see data for US-based employees. I’ve created another security role with the right data selected, HRUS. -Now, I’ll move to my tables and choose from the options for this employee’s table and I’ll select row security. Row level security in OneLake uses SQL statements to limit what people can see. I’ll do that here with a simple select statement to limit country to USA. Now, from the HR team’s perspective, they can start to query the data using another engine, Spark, to analyze employer retention. But only across US based employees, as you can see from the country column. And as mentioned, this applies to all engines, no matter how you access it, including the Parquet files directly in OneLake. Next, let’s move on to data classification options that can be used to inform policy controls. Here, the good news is the same labels you’ve defined in Microsoft Purview for your organization used in Microsoft 365 for emails, messaging, files, sites, and meetings can be applied to data items in OneLake. -Additionally, Microsoft Purview policy controls can be used to automatically label content in OneLake. And another benefit I can show you from the lineage view is label inheritance. Notice this Lakehouse is labeled Non-Business, as is NorthwindTest, but look at the connected data items on the right of NorthwindTest. They are also non-business. If I move into the test lakehouse and apply a label either automatically or manually to my data, like I’m doing here, then I move back to the lineage view. My downstream data items like this model and the SQL analytics endpoint below it have automatically inherited the upstream label. -So now we’ve explored OneLake security controls, their implementation, and enforcement, let’s look at how this works hand in hand with the OneLake catalog for data discovery and management. First, to know that you’re in the right place, you can use branded domains to organize collections of data. I’ll choose the sales domain. To get the data I want, I can see my items as the ones I own, endorsed items, and my favorites. I can filter by workspace. And on top, I can select the type of data item that I’m looking for. Then if I move over to tags, I can find ones associated with cost centers, dates, or other collection types. -Now, let’s take a look at a data item. This shows me more detail, like the owner and location. I can also see table schemas and more below. I can preview data within the tables directly from here. Then using the lineage tab, it shows me a list of connected and related items. Lastly, the monitor tab lets me track data refresh history. Now, let me show you how as a data owner you can view and manage these data items. From the settings of this lakehouse, I can change its properties and metadata, such as the endorsement or update the sensitivity label. And as the data owner, I can also share it securely internally or even externally with approved recipients. I’ll choose a colleague, dave@contoso.com, and share it. -Next, the govern tab in the OneLake catalog gives you even more control as a data owner, as well as recommendations to make data more secure and compliant. You’ll find it on the OneLake catalog main page. This gives me key insights at a glance, like the number and type of items I own. And when I click into view more, I see additional information like my data hierarchy. Below that, item inventory and data refresh status. Sensitivity label coverage gives me an idea of how compliant my data items are. And I can assess data completeness based on whether an item is properly tagged, described, and endorsed across the items I own. Back on the main view, I can see governance actions tailored specifically to my data, like increasing sensitivity label, coverage, and more. -The OneLake catalog is integrated across Microsoft Fabric experiences to help people quickly discover the items they need. And it’s also integrated with your favorite Office apps, including Microsoft Excel, where you can use the get data control to select and access data in OneLake. And right in context, without leaving the app, you can define what you want and pull it directly into your Excel file for analysis. The OneLake catalog is the one place where you can discover the data that you want and manage the data that you own. And combined with OneLake security controls, you can do all of this without increasing your data security risks. -To find out more and get started, check out our blog at aka.ms/OneLakeSecurity. Also, be sure to sign up for a 60 day free trial at fabric.microsoft.com. And keep watching Mechanics for the latest updates across Microsoft, subscribe to our channel, and thanks for watching.356Views0likes0CommentsConnect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric
Easily access and unify your data for analytics and AI — no matter where it lives. With OneLake in Microsoft Fabric, you can connect to data across multiple clouds, databases, and formats without duplication. Use the OneLake catalog to quickly find and interact with your data, and let Copilot in Fabric help you transform and analyze it effortlessly. Eliminate barriers to working with your data using Shortcuts to virtualize external sources and Mirroring to keep databases and warehouses in sync — all without ETL. For deeper integration, leverage Data Factory’s 180+ connectors to bring in structured, unstructured, and real-time streaming data at scale. Maraki Ketema from the Microsoft Fabric team shows how to combine these methods, ensuring fast, reliable access to quality data for analytics and AI workloads. Access your data instantly. Connect to Azure, AWS, Snowflake & on-prem sources in OneLake without moving a single file. Get started with Microsoft Fabric. Replicate databases with near-zero latency. Fast, reliable analytics. Check out Mirroring in Microsoft Fabric. ETL, data prep & movement at scale. Fabric Data Factory makes it simple & efficient to move data faster. See how it works. Watch our video here. QUICK LINKS: 00:00 — Access data wherever it lives 00:42 — Microsoft Fabric background 01:17 — Manage data with Microsoft Fabric 03:04 — Low latency 03:34 — How Shortcuts work 06:41 — Mirroring 08:10 — Open mirroring 08:40 — Low friction ways to bring data in 09:32 — Data Factory in Microsoft Fabric 10:52 — Build out your data flow 11:49 — Use built-in AI to ask questions of data 12:56 — OneLake catalog 13:36 — Data security & compliance 15:10 — Additional options to bring data in 15:42 — Wrap up Link References Watch our show on Real-Time Intelligence at https://aka.ms/MechanicsRTI Check out Open Mirroring at https://aka.ms/FabricOpenMirroring Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: - If you’ve struggled with accessing data for your analytics and AI workloads, as it’s spread across different clouds or databases and in different formats, today, we will look at the options available to you for connecting and accessing data wherever it lives with the unified data lake, OneLake part of the cloud data analytics and AI platform, Microsoft Fabric. And importantly, we’ll show you how easy it is for your team members to find the data that you brought in with the new OneLake catalog and how you can use Copilot and Fabric as you work to interact with your data wherever it lives from OneLake. And joining me today from the Microsoft Fabric Product team is Maraki Ketema. Welcome to the show. - Thanks for having me. - And thanks so much for joining us today. But before we get into this, why don’t we set a bit of context for anyone who’s new to Microsoft Fabric. So, Microsoft Fabric is a preintegrated optimized SaaS environment, which provides a comprehensive set of data analytics and an AI platform with built-in capabilities for data integration, data engineering, data science, data warehousing, real time intelligence, data visualization and overall data management. Underpinning Fabric is its multi-cloud data lake, OneLake, which gives you a central point for data to be discovered and accessed wherever it resides across your data estate at scale. Now, we’ve covered Microsoft Fabric in a lot of past shows, but today we really want to be able to specifically demystify how it can help you get a better handle on your data. - Well, it helps on a number of levels. You’ve already mentioned scalability and with all of the integrated capabilities for data teams to collaborate on building clean quality data, it can be done at scale for any use case. And OneLake really is the key to getting a handle on your data by making it accessible with support for open formats like Delta Parquet and Iceberg. This helps eliminate traditional barriers to working with your data, and we give you a variety of methods to bring your data into OneLake like Shortcuts where you can virtualize data from where it’s stored, which creates a pointer to any structured open file-based tabular data or unstructured files, even images and multimedia. All this happens without duplicating the data or options for Mirroring where you can create an always up-to-date replica of the source in Fabric. And this is great for databases and data warehouses with proprietary formats where your business critical data may be stored. Now both of these options can be used like any other native data in OneLake and they require no ETL. Then for all of your other sources that require data transformation or read or write capabilities, you can use the hundreds of connectors provided by Data Factory and Microsoft Fabric to make your data natively available in OneLake and to bring streaming data, you’ll use Microsoft Real Time Intelligence. You’ll likely use these techniques to different extents depending on your data and AI needs and whichever method you use to connect data, we make it available with minimal latency. This is super important, for example, for real time or gen AI tasks because they’re less predictable whereas a user or agent interacts on the backend. This can quickly create a series of requests to retrieve data which need to happen fast to ground the AI so that responses aren’t delayed. Fabric takes care of all of this for you at scale and at low latency. - So quality data then become super accessible whenever you need it and wherever it lives. Why don’t we show them a few examples of this? - Sure. So, today I’m going to walk you through an e-commerce system and it’s for a Supermart with a grocery department where we need to quickly understand demand versus supply as well as market competition over prices and get a 360 view of operations and customer experiences. Now, different teams, including marketing, analytics and IT are collaborating together in a single Fabric workspace. Now here the marketing team creates promotions daily and they work with different vendors who are using different systems to store data and there’s no standard file type. The good news is that we can connect to all of these different systems using Shortcuts. Let me show you how that works. Here under Get Data, I can see my option to bring data in. I’ll choose a new shortcut. You’ll see that I have both Microsoft and non-Microsoft locations. In this case, I want to connect to Amazon S3 for unstructured data. From here, if I don’t already have a connected data source, I can create a new connection using credentials for the service. But to save time, I’ll use an existing connection. I’ll choose the second option here. I can explore the data available to me and I can choose the specific folders I want. I’ll pick a few for Kentoso and confirm. Now the data’s in OneLake, and I can expand the folders and look at different data like these markdown files with texts, which contain customer feedback and I have a nice preview of the data to understand what’s in it. Additionally, I have some image data on my local drive that I want to share with others on my team as we’re trying to figure out the best placement for in-store promotions. The good news is that I can also shortcut to all of this data in OneLake directly from my desktop. Let’s take a look. Here I am in Windows File Explorer and I’m connected to OneLake and I can interact with these files and sync them right from here. In fact, here I’m adding an image file from our grocery department, and from the status I can see that it’s already synced. Now if I move back over to Fabric, you’ll see that it’s just synced into my lakehouse view. From here, I can preview the image right away. So now I have the information I need to start analyzing customer sentiment and where we can place point of sale promotions. Again, in both examples, the file data still remains at the source, just like shortcuts on your desktop, the data doesn’t actually live in OneLake, but always stays in sync. Shortcuts in Microsoft Fabric are supported for open storage formats like Microsoft Dataverse, Azure Data Lake Storage, Google Cloud Storage, Databricks, Amazon S3, and any S3 compatible stores and more. And you can also use Shortcuts for on-premise data sources using Fabric on-premise data gateway. - And beyond your file data, your Supermart is probably dependent on operational data that’s sitting in databases and warehouses, all of which might have their own proprietary formats. So what’s the path of least effort then to bring that data in? -So this is where Mirroring in Microsoft Fabric comes into play. It makes it easy to replicate data into OneLake and storage is included as part of your existing Microsoft Fabric capacity. Let’s jump in. Here, you can see my sales dashboard, which is broken down by category, location and even has some forecasting built in. And on the back end, I already have various sources mirrored into my Fabric workspace in OneLake that are feeding into this particular view. -I’m going to use Mirroring and create a new item to connect to Azure SQL DB and bring in data from the Supermarts in the same region. I’ll filter by mirror and then select the Azure SQL Database option. From here, I’ll add my connection details. I’ll type the database name and the rest securely auto completes. After I connect, it takes seconds to show the table in the database. And from there it’s just one more click to create the mirror database and now it’s ready to use in OneLake. Just like Shortcuts, all of this works without ETL or moving the source data. And now if we go back to our Get data page, you’ll notice that most of the Azure databases are directly supported for Mirroring as well as Snowflake. That said, you aren’t limited to using Mirroring for just these sources. You’ll notice that I have two sources here, Salesforce and our legacy on-prem SQL database. These were brought into OneLake using open mirroring. Open mirroring is an API, which lets users and data providers bring data in from any source while keeping them in sync. You can learn more about open mirroring at aka.ms/FabricOpenMirroring. - So Mirroring then has a great potential than in terms of being a frictionless way to bring your data in. But how real time then is the synchronization? - It’s near real time. Once you’ve created the Mirror database and brought your data in, you don’t need to do anything else to keep the data fresh. On the backend Fabric is continuously listening for changes and making updates to the data in OneLake. So I’ll go ahead and refresh my sales dashboard and you can see the updates flow in. Our sales just quadrupled in seconds with this new database. That’s actually because we’ve added a lot more stores with their sales data. - This is really a game changer then in terms of the time to insights and that you have these low friction ways to bring your data in. That said though, there are lots of cases where you might want to transform your data and need to be able to use more data integration work before you bring it in. -Right. And that’s where Data Factory and Microsoft Fabric is a powerful engine that can bring in your data at petabyte scale with everything you need to prep and transform the data to. Let’s take a look. As you begin to create pipelines, to bring your data, you’ll see that we now have more than 180 connectors to the most common data types. And these span both Microsoft and non-Microsoft options. And connecting to one is like we showed before with Shortcuts. If I click on Snowflake, for example, I just need to add connection settings and valid credentials to add the data source to my pipeline. And from here, let me go deeper on the pipeline experience itself. Here is one that I’ve already started. It takes our Supermart data through the bronze and silver layers before landing the curated data in the gold layer. To gain a deeper understanding, we can actually use Copilot to generate a summary of what the pipeline is doing and in seconds, as Copilot explains here, data is loaded before data is curated, and we have schema validation, which picks up on file mismatches and places them in a separate folder after sending an alert. The pipeline provides a visual view of all of these steps. Then if I move over to my notebook, you’ll see that it applies transformations on the data before it’s loaded into our gold layer. -Now, once my data’s in OneLake, I can also start building out my own data flows. Here’s a table that I just pulled in from Excel that looks at grocery transactions over the past quarter. This table is currently super wide, making analysis, very, very difficult. Here’s where the power of Copilot comes in. I don’t need to know the right buttons or terms or words. Sometimes it can just be as simple as describing how I want my tables to look, and I’ll submit this prompt, and almost instantly the table is transformed and more optimized for analysis. While I’m at it, I can also use Copilot to do a simple operation like renaming a column and pay attention to the middle column. The name was just changed. But what if someone inherits the state of flow? Copilot can also provide descriptions of what your query is doing to help save time. It’s described the query and it’s easy to understand for anyone. And here’s the real power of everything we’ve done today. As you can see in our lineage, we now have all our connected data sources from Shortcuts, Mirroring, and now Data Factory. Not only can I now see everything connected in my dashboard, but I can also use natural language with built-in AI to ask questions of my data. -In this case, I want to get ahead of wastage issues in our grocery department. My dashboard doesn’t quite help me here. This is where we can use the built-in AI to ask questions of the data. So I’ll go ahead and prompt it with which products are at risk of spoilage and required discounting. It’ll take a sec, and once that completes, I’ll get a top level view of the products at risk with details about their expiration dates. Under that, I can see the breakdown of its reasoning with a detailed table of each item with quantity per store. And there’s even the raw SQL query, the agent used to derive these insights. - And that was a really powerful example of what you can do once your data is in OneLake. But what if I’m not as close to the data and I want to be able to discover data that I have access to? - OneLake has the one Lake catalog, which is a central place for data users to discover the data they need and manage the data they own. Let’s take a look from the OneLake catalog, I can see everything I have access to. On the left, I can filter the views by my items, items endorsed by others on my team favorites and individual workspaces. At the top, I can also filter by different types of data artifacts, insights, and processes. Let’s take a look at the Ask questions. AI experience I just showed, and here I can see the lineage for how the data’s coming in. That said, with all this ease of data discovery, it’s super important to control and manage access to the data that’s exposed through OneLake. And what’s great is that data compliance controls from Microsoft Purview are built in. I can see the sensitivity labels for any data asset, and from a lineage perspective, these labels are automatically inherited from upstream parent data sources. Permissions are also fully manageable, and if there’s a direct link to this artifact, I’ll be able to see it here. Under the direct access tab, I can see who and which groups have access to this data already. And as a data admin, I can also add users to grant access to specific resources. In fact, I’ll go ahead and add you to this one, Jeremy and I can determine if you’re allowed to share it with others, edit or even view the data itself. - Okay, so now if we move over to my screen, I can see that the Ask Queue item has been shared with me, and it’s available right here. Now to show you the process to discover and request something, I’ll first filter data in my catalog view by semantic models just to narrow the list down a bit and for items that you can see but not access. You’ll see this icon here and there’s a button to request access like with this operations model here. And when I use that, I can add a message for why I’m requesting and send it to the admin for that data to get their approval. - And beyond access management, the integrations with Microsoft Purview for data security and compliance keep getting deeper. Also, there’s another option for bringing data into OneLake that we haven’t demonstrated, and that’s real time streaming data. That’s because there’s an entire show on how to do that using real-time intelligence that you can check out at aka.ms/MechanicsRTI - It’s really great to see all the ways that you can bring quality data into OneLake for analytics to ground your AI workloads. In fact, you can bring data in from OneLake for use with your Gen AI apps and agents using Azure AI Foundry, which we’ll cover more in an upcoming show. So, Maraki what do you recommend for all the people watching right now to learn more? - It’s simple, you can try everything I show today and everything else Fabric has to offer by signing up for a generous 60 day free trial. We don’t even require a credit card to get started. - So now you have lots of options to bring data in and to start working with it. Thanks so much for joining us today, Maraki and thank you for joining us to learn more about all the updates now. If you haven’t yet, be sure to subscribe to Microsoft Mechanics and we’ll see you again soon.1.4KViews0likes0Comments