Blog Post

Microsoft Security Community Blog
2 MIN READ

Automate bulk metadata updates in Microsoft Purview Data Map using Azure Functions and AI

Aishwarya_Dinde's avatar
Jul 24, 2025

Automate and streamline bulk metadata updates in Microsoft Purview Data Map using serverless Azure Functions. Enhance accuracy and efficiency by integrating AI for intelligent metadata extraction and mapping.

To automate metadata updates, the solution leverages:

  • Azure Functions: Serverless compute to execute update logic on a schedule or event.
  • Microsoft Purview REST API: Enables programmatic access to search, update, and manage metadata assets.
  • Managed Identity: Allows secure, password-less authentication to Microsoft Purview and other Azure resources.

 

This architecture eliminates the need for storing secrets or credentials keeping security and compliance at the forefront.

Architecture Overview
  1. Data Source
    Structured metadata (e.g., in CSV or Json) is uploaded to Azure Blob Storage.
  2. Azure Function with Timer Trigger
    The function executes on a schedule, reads metadata from Blob Storage, and sends authenticated update requests to the Purview API.
  3. Purview API with Secure Access
    Metadata updates are processed in bulk using the REST API, authorized by a Managed Identity token.

 

Metadata Sample format :

 

Step-by-Step Implementation
  1. Configure Permissions and Authentication
  • Assign a System-Assigned or User-Assigned Managed Identity to the Azure Function.
  • In Microsoft Purview, assign the Data Curator role to this identity.
  • In Azure Blob Storage, grant the identity Reader or Storage Blob Data Reader access to read the source files.

 

🔐 Sample Token Acquisition Code Using Managed Identity

import requests

from azure.identity import ManagedIdentityCredential

 

# Replace with your Purview account name

purview_account_name = "your-purview-account"

purview_scope = f"https://{purview_account_name}.purview.azure.com/.default"

 

# Get access token

credential = ManagedIdentityCredential()

token = credential.get_token(purview_scope)

 

headers = {

    "Authorization": f"Bearer {token.token}",

    "Content-Type": "application/json"

}

 

# Use `headers` in your API call to Purview

This avoids hardcoding any secrets and leverages Azure's built-in security.

 

  1. Schedule the Function

Azure Functions support CRON expressions for flexible scheduling. Example (run every 5 minutes):

0 */5 * * * *

You can adjust the frequency or trigger the function after a Purview scan using event-based triggers

 

  1. Optional Enhancements
  • 🔍 AI-Generated Metadata
    Use OpenAI to generate meaningful descriptions or summaries from asset or column names before updating Purview.
  • 📊 Excel or CSV Mapping
    Maintain a reference file in Blob Storage for asset names and target metadata fields.
  • 📈 Monitoring and Logging
    Enable Azure Application Insights to monitor execution success, failures, and detailed logs.
Best Practices

Validate Metadata Inputs
Ensure required fields are present before making API calls.

Use Batching
Group API calls into batches to optimize performance and avoid throttling.

Secure Your Data
Restrict Blob Storage access using Azure RBAC. Never hardcode secrets—always use Managed Identity.

Retry Logic and Error Handling
Incorporate retry mechanisms for transient API errors and log failed entries for review.

Conclusion

By combining Azure Functions, Microsoft Purview REST API, and Managed Identity, you can build a secure, scalable, and automated solution for bulk metadata updates.

This approach ensures:

  • Consistent metadata quality
  • Reduced manual effort
  • Improved data discoverability and compliance
  • Enterprise-grade security with no secrets

Please find the code repository in GitHub: PurviewDataMapMetadata

Updated Jul 23, 2025
Version 1.0
No CommentsBe the first to comment