Forum Discussion

southpawmurph's avatar
southpawmurph
Copper Contributor
May 15, 2026

Data System Wide Lineage via API Request

I'm struggling with finding a solution. My goal is to identify all existing lineage relationships for any data objects within a specific data system they belong to. I've been using the Purview REST API (Datamap Dataplane) but I haven't found an endpoint returning data system side lineage/relationships.

For my scenario I have a Databricks metastore and need to know the existing lineage relationships of those data objects within Purview so I can purge them out when we are doing our scheduled lineage refresh.

1 Reply

  • Hi,

    There is no system-wide lineage endpoint, but here is an efficient approach for Databricks.H H

    Step 1: Enumerate assets with a qualifiedName prefix filter

    Use the Search API scoped to your metastore rather than just filtering by typeName:

    POST /datamap/api/atlas/v2/search/query { "keywords": null, "limit": 1000, "offset": 0, "filter": { "attributeName": "qualifiedName", "operator": "startswith", "value": "databricks://<your-metastore-id>" } }

    Paginate with offset until all GUIDs are collected.

    Step 2: Fetch lineage and target Process entities

    GET /datamap/api/atlas/v2/lineage/{guid}?direction=BOTH&depth=3

    The relationship IDs in the lineage response point back to Process entities (notebook runs, Spark jobs). For a refresh scenario, delete those Process entities directly rather than individual relationship GUIDs. Deleting a Process cascades its input/output relationships automatically and avoids leaving orphaned nodes behind.

    DELETE /datamap/api/atlas/v2/entity/guid/{processEntityGuid}

    Step 3: Filter by ownership before deleting

    Check createdBy on each entity before deletion to ensure you only remove what your refresh pipeline owns, leaving scan-managed or ADF-managed lineage intact.

    Databricks-specific caveat

    If you are on Unity Catalog, Purview reads lineage from the system.access schema during scans. Purging relationships via API and then running a scan will re-ingest that lineage from Databricks. Coordinate your purge and lineage push around your scan schedule, or scope the scan to exclude the affected catalogs during the refresh window.

    For large lineage graphs, use the paginated traversal endpoint instead of increasing depth:

    GET /datamap/api/atlas/v2/lineage/{guid}/next/