Exporting Microsoft Purview Data Assets using the REST API

Question

Exporting Data Assets from Microsoft Purview using the REST API in Python enables a streamlined process to retrieve structured metadata and asset information. By leveraging the REST API and Python, users can effortlessly access and export Data Assets, ensuring a programmatic and efficient approach. The powerful combination of the REST API and Python empowers users with flexibility and automation capabilities, facilitating the extraction of Microsoft Purview Data Assets and seamless integration with various data management and analytics workflows.
&nbsp;
I performed a sample search on the Microsoft Purview governance portal using "*" as the keyword to generate a list of all data assets. The accompanying screenshot from the Purview portal serves as a reference.

&nbsp;
The generated CSV file shown below is the output obtained from Microsoft Purview using the REST API.

&nbsp;
Here's a guide on exporting data assets from Microsoft Purview using the REST API in Python.
To access Microsoft Purview through the Python SDK, please ensure that you install the following PyPI libraries:
&nbsp;
pip install azure-identity
pip install azure-purview-scanning
pip install azure-purview-administration
pip install azure-purview-catalog
pip install azure-purview-account
pip install azure-core
pip install pandas
&nbsp;
&nbsp;
Important
Your endpoint value will be different depending on which Microsoft Purview portal you are using. Endpoint for the&nbsp;classic Microsoft Purview governance portal:&nbsp;https://{your_purview_account_name}.purview.azure.com/&nbsp;Endpoint for the New Microsoft Purview portal:&nbsp;https://api.purview-service.microsoft.com
Scan endpoint for the&nbsp;classic Microsoft Purview governance portal:&nbsp;https://{your_purview_account_name}.scan.purview.azure.com/&nbsp;Endpoint for the New Microsoft Purview portal:&nbsp;https://api.scan.purview-service.microsoft.com&nbsp;
&nbsp;
To create a Service Principal and grant Data Reader or Data Curator access to the Service Principal at the Microsoft Purview Collection Level, please refer to the instructions provided [here].
&nbsp;
keywords = "*"
tenant_id = "&lt;Please update the Microsoft Purview tenant ID here&gt;"
client_id = "&lt;Please provide the updated Service Principal client ID that has access to the Microsoft Purview account&gt;"
client_secret = "&lt;Please update the Service Principal client secret for the aforementioned client ID&gt;"
purview_endpoint = "https://&lt;Please provide the name of the Microsoft Purview account&gt;.purview.azure.com/"
purview_scan_endpoint = "https://&lt;Please provide the name of the Microsoft Purview account&gt;.scan.purview.azure.com/"
&nbsp;
&nbsp;
Retrieve the entire notebook file from [GitHub].
&nbsp;
from azure.purview.catalog import PurviewCatalogClient
from azure.identity import ClientSecretCredential 
from azure.core.exceptions import HttpResponseError
import pandas as pd
from pandas.io.json import json_normalize

keywords = "*"
export_csv_path = "purview_search_export.csv"

keywords = "*"
tenant_id = "&lt;Please update the Microsoft Purview tenant ID here&gt;"
client_id = "&lt;Please provide the updated Service Principal client ID that has access to the Microsoft Purview account&gt;"
client_secret = "&lt;Please update the Service Principal client secret for the aforementioned client ID&gt;"
purview_endpoint = "https://&lt;Please provide the name of the Microsoft Purview account&gt;.purview.azure.com/"
purview_scan_endpoint = "https://&lt;Please provide the name of the Microsoft Purview account&gt;.scan.purview.azure.com/"

def get_credentials():
	credentials = ClientSecretCredential(client_id=client_id, client_secret=client_secret, tenant_id=tenant_id)
	return credentials

def get_catalog_client():
	credentials = get_credentials()
	client = PurviewCatalogClient(endpoint=purview_endpoint, credential=credentials, logging_enable=True)
	return client

body_input={
	"keywords": keywords
}

try:
	catalog_client = get_catalog_client()
except ValueError as e:
	print(e)

try:
	response = catalog_client.discovery.query(search_request=body_input)
	df = pd.DataFrame(response)
	jdf = pd.json_normalize(df.value)
	jdf.to_csv(export_csv_path, index=False)
except HttpResponseError as e:
	print(e)
&nbsp;
&nbsp;
The provided Python notebook or script is capable of exporting the following set of columns in the output CSV file.

endorsement
collectionId
updateTime
name

description
displayText
label
sensitivityLabelId

objectType
isIndexed
assetType
@search.score

updateBy
qualifiedName
createBy
owner

id
entityType
createTime
classification

&nbsp;
Additional Reference:&nbsp;Exploring Purview’s REST API with Python (microsoft.com)

yuvaraja1 · Answer

Hi nsakthiI am trying to create "New Term in the Glossary" in the Microsoft Purview New Portal using a Python script. However, the creation is occurring in the Classic Portal instead.Below is the URL I am using in the Python script:https://api.purview-service.microsoft.comThanks in advance!

nsakthi · Answer

Hi Yuvaraja1
Kindly check your firewall settings and API version. Additionally, if you have Private Endpoints enabled, please verify your DNS configuration.
&nbsp;
https://docs.azure.cn/en-us/purview/migrate-to-governance-private-endpoints#:~:text=If%20you%20configured%20firewall%20allowlist%20rules%20for%20your%20account%20endpoints&nbsp;
Configure Microsoft Purview firewall - Microsoft Purview | Azure Docs
Confirm that your firewall allows these global and tenant-specific endpoints (replacing the&nbsp;accountname&nbsp;and&nbsp;tenantid&nbsp;with your values):

api.purview-service.microsoft.com
accountname.purview.azure.com
tenantid-api.purview-service.microsoft.com

I hope this helps.

yuvaraja1 · Answer

Hi nsakthi,Thank you for your response.I verified that the endpoint is not a private endpoint, it is a public endpoint and then tried the three URLs you shared, but they all redirected me to the Classic Portal.

Forum Discussion

Exporting Microsoft Purview Data Assets using the REST API

Share

Resources

endorsement	collectionId	updateTime	name
description	displayText	label	sensitivityLabelId
objectType	isIndexed	assetType	@search.score
updateBy	qualifiedName	createBy	owner
id	entityType	createTime	classification