Data curation: Discover more with data estate insights in Microsoft Purview
Published Oct 26 2022 09:46 AM 2,292 Views
Microsoft

Let's start with an introduction to the data curation process. Based on data governance industry-standard, data curation is an end-to-end process of preparing and managing data. As an outcome of the process, business users can better understand and use data in their daily activities.

In the case of Purview, all data assets are categorized into three buckets - "Fully curated", "Partially curated", and "Not curated", based on certain attributes of assets being present. An asset is "Fully curated" if it has at least one classification tag, an assigned Data Owner and a description. If any of these attributes is missing, but not all, then the asset is categorized as "Partially curated". If all of them are missing, then it's "Not curated".

 

Microsoft Purview offers a feature to view assets state in the data catalog and track the full data curation process. This feature is called "Data estate insights" and can be accessed from the left menu in the Microsoft Purview governance portal.

 

This view includes at the top of the page, key performance indicators on the general state of the catalog:

  • The curation state of the resources ("Asset creation")
  • The assignment of owners to assets ("Asset data ownership")
 

The "Data stewardship" view shows the status of the catalog after scans and classifications occurred.

 

Below is a sample of the view for a whole organization:

 

PurviewInsightsDataEstate.png

 

The report relies on the collections in order to present the data according to them. In this example, we will focus on the Canada collection using the "Collection" filter on the different visuals of the report.

 

PurviewInsightsCurationScope.png

 

This allows us to get a catalog status overview for collections owned by Canada. By clicking on the "View details" link, a detailed report is displayed allowing you to have a vision at the sub-collection level.

 

This detailed view allows you to quickly know which collection to process first. By clicking on the name of one of the collections, we then obtain the details of all the assets present in this collection as well as for all the sub-collections. Clicking on the Canada collection will provide a detailed list of assets in the Canada collection, including sub-collections:

 

This view therefore offers a detailed list of all assets for a defined collection, but also the possibility to filter it according to needs. These filters concern:

  • Collections
  • Data source types
  • Instances
  • Classification
  • Data owner
 

Below is a view showing only assets without classification for the Canada collection (and its sub-collections). Another interesting feature, data from this generated view can be exported in CSV format.

 

Exporting to CSV format allows rapid communication to data managers, targeting assets to be updated. It is also possible to use this CSV to integrate it into quality monitoring or reporting systems such as Power BI or Microsoft Excel. Below are some sample reports-

 

With Power BI :

 

PurviewInsightsPBI.png

 

With Microsoft Excel:

 
 

PurviewInsightsExcel.png

 

This article describes how to perform data curation inside your organization. Another article prepared by @Bartek_Graczyk will cover how Purview helps with data catalog adoption.

 

Version history
Last update:
‎Oct 26 2022 09:46 AM
Updated by: