In today's era of AI, data governance and security have become essential for businesses to safely derive insights and drive responsible innovation. This blog explores the challenges of an ever-growing data estate and showcases recent innovations in Microsoft Purview that enable organizations to navigate these modern challenges.
60% of CDOs cite data integration challenges as a top pain-point due to lack of knowledge of where relevant data resides [1]. Companies operate on multi-platform, multi-cloud data estates making it harder than ever to seamlessly discover, secure, govern and activate data. This increases the overall complexity when enabling users to responsibly derive insights and drive business value from data. In the era of AI, data governance is no longer an afterthought, data security and data governance are now both table stakes.
Data Governance is not a new concept but with the proliferation of AI and evolving regulatory landscape, data governance is critical for safeguarding data related to AI-driven business innovation. With 95% of organizations implementing or developing an AI strategy [2], customers are facing emerging governance challenges, such as:
- False signals: The lack of clean accurate data can cause false signals in AI which can trigger consequential business outcomes or lead to incorrect reported forecasting and regulatory fines.
- Time to insight: Data scientists and analysts spend 60-80% of their time on data access and preparation to feed AI initiatives which leads to staff frustration, increased OPEX, and delays in critical AI innovation priorities.
- Shadow innovation: Data innovation outside governance can increase business risks around data leakage, oversharing, or inaccurate outcomes.
This is why federated governance has surfaced as a top priority across security and data leaders because it unlocks data innovation while maintaining appropriate data oversight to help minimize risks.
Customers are seeking more unified solutions that enable data security and governance seamlessly across their complex data estate. To help customers better respond to these needs, Microsoft Purview unifies data security, data governance, and data compliance solutions across the heterogeneous data estate for the era of AI. Microsoft Purview also works closely with Microsoft Fabric to integrate capabilities that help seamlessly secure and govern data to help reduce risks associated with data activation across the Microsoft Intelligent Data Platform and across the Microsoft Cloud portfolio.
Microsoft Fabric delivers a pre-integrated and optimized SaaS environment for data teams to work faster together over secure and governed data within the Fabric environment. Combining the strengths of Microsoft Purview and Microsoft Fabric enables organizations to more confidently leverage Fabric to unlock data innovation across data engineers, analysts, data scientists, and developers whilst Purview enables data security teams to extend Purview advanced data security value and enables the central data office to extend Purview advanced data governance value across Fabric, Azure, M365, and the heterogenous data estate.
Furthering this vision, today Microsoft is announcing 1. a new name for the Purview Data Governance solution, Purview Unified Catalog, to better reflect its growing catalog capabilities, 2. integration with new OneLake catalog, 3. a new data quality scan engine, 4. Purview Analytics in OneLake, and 5. expanded Data Loss Prevention (DLP) capabilities for Fabric lakehouse and semantic models.
- Introducing Unified Catalog: a new name for the visionary solution
The Microsoft Purview data governance solution, made generally available in September, delivers comprehensive visibility, data confidence, and responsible innovation—for greater business value in the era of AI. The solution streamlines metadata from disparate catalogs and sources, like OneLake, Databricks Unity, and Snowflake Polaris, into a unified experience. To better reflect these comprehensive customer benefits, Microsoft Purview Data Catalog is being renamed to Microsoft Purview Unified Catalog to exemplify the growing catalog capabilities such as deeper data quality support for more cloud sources, and Purview Analytics in OneLake.
A data catalog serves as a comprehensive inventory of an organization's data assets. As the Microsoft Purview Unified Catalog continues to add on capabilities within curation, data quality, and third-party platform integration, the new Unified Catalog name reflects the current cross-cloud capability. This cross-cloud capability is illustrated in the figure below. This data product contains data assets from multiple different sources, including a Fabric lakehouse table, Snowflake Table and Azure Databricks Table. With the proper curation of analytics into data products, data users can govern data assets easier than ever.
Figure 1: Curation of a data product from disparate data sources within Purview’s Unified Catalog
- Introducing OneLake catalog (Preview)
As announced in the Microsoft Fabric blog earlier today, the OneLake catalog is a solution purpose-built for data engineers, data scientists, developers, analysts, and data consumers to explore, manage, and govern data in Fabric.
The new OneLake catalog works with Purview by seamlessly connecting data assets governed by OneLake catalog into Purview Unified Catalog, enabling the central data office to centrally govern and manage data assets. The Purview Unified Catalog offers data stewards and data owners advanced capabilities for data curation, advanced data quality, end-to-end data lineage, and an intuitive global catalog that spans the data estate. For data leaders, Unified Catalog offers built-in reports for actionable insights into data health and risks and the ability to confidently govern data across the heterogeneous data estate. In figure 2, you can see how Fabric data is seamlessly curated into the Corporate Emissions Created by AI for CY2024 Data Product, built with data assets from OneLake.
Figure 2: Data product curated with Fabric assets
- Introducing a new data quality scan engine for deeper data quality (Preview)
Purview offers deeper data quality support, through a new data quality scan engine for big data platforms, including: Microsoft Fabric, Databricks Unity Catalog, Snowflake, Google Big Query, and Amazon S3, supporting open standard file and table formats. In short, this new scan engine allows businesses to centrally perform rich data quality management from within the Purview Unified Catalog.
In Figure 3, you can see how users can run different data quality rules on a particular asset, in this case, a table hosted in OneLake, and when users click on “run quality scan”, the scanner runs a deep scan on the data itself, running the data quality rules in real time, and updating the quality score for that particular asset.
Figure 3: Running a data quality scan on an asset living in OneLake
- Introducing Purview Analytics in OneLake (Preview)
To further an organization’s data quality management practice, data stewards can now leverage a new Purview Analytics in OneLake capability, in preview, to extract tenant-specific metadata from the Purview Unified Catalog and publish to OneLake. This new capability enables deeper data quality and lineage investigation using the rich capabilities in Power BI within Microsoft Fabric.
Figure 4: In Unified Catalog settings, a user can add self-serve analytics to Microsoft Fabric
Figure 5: Curated metadata from Purview within Fabric
- Expanded Data Loss Prevention (DLP) capabilities for Fabric lakehouse and semantic models
To broaden Purview data security features for Fabric, today we are announcing that the restrict access action in Purview DLP policies now extends to Fabric semantic models. With the restrict access action, DLP admins can configure policies to detect sensitive information in semantic models and limit access to only internal users or data owners. This control is valuable for when a Fabric tenant includes guest users and you want to limit unnecessary access to internal proprietary data. The addition of the restrict access action for Fabric semantic models augments the existing ability to detect upload of sensitive data to Fabric lakehouses announced earlier this year. Learn more about the new Purview DLP capabilities for Fabric lakehouses and semantic models in the DLP blog.
Figure 6: Example of restricted access to a Fabric semantic model enforced through a Purview DLP policy.
Summary
With these investments in security and governance, Microsoft Purview is delivering on its vision to extend data protection customer value and innovation across your heterogenous data estate for reduced complexities and improved risk mitigation. Together Purview and Fabric set the foundations for a modern intelligent data platform with seamless security and governance to drive AI innovation you can trust.
Learn more
As we continue to innovate our products to expand the security and governance capabilities, check out these resources to stay informed.
- https://aka.ms/Try-Purview-Governance
- https://www.microsoft.com/en-us/security/business/microsoft-purview
- https://aka.ms/try-fabric
[1] Top 7 Challenges in Data Integration and How to Solve Them | by Codvo Marketing | Medium
[2] Microsoft internal research May 2023, N=638