Data System Wide Lineage via API Request

I'm struggling with finding a solution. My goal is to identify all existing lineage relationships for any data objects within a specific data system they belong to. I've been using the Purview REST A...

api

lineage

purview

GuidovanDijk
May 30, 2026
Hi,

There is no system-wide lineage endpoint, but here is an efficient approach for Databricks.H H
Step 1: Enumerate assets with a qualifiedName prefix filter
Use the Search API scoped to your metastore rather than just filtering by typeName:
POST /datamap/api/atlas/v2/search/query { "keywords": null, "limit": 1000, "offset": 0, "filter": { "attributeName": "qualifiedName", "operator": "startswith", "value": "databricks://<your-metastore-id>" } }
Paginate with offset until all GUIDs are collected.
Step 2: Fetch lineage and target Process entities
GET /datamap/api/atlas/v2/lineage/{guid}?direction=BOTH&depth=3
The relationship IDs in the lineage response point back to Process entities (notebook runs, Spark jobs). For a refresh scenario, delete those Process entities directly rather than individual relationship GUIDs. Deleting a Process cascades its input/output relationships automatically and avoids leaving orphaned nodes behind.
DELETE /datamap/api/atlas/v2/entity/guid/{processEntityGuid}
Step 3: Filter by ownership before deleting
Check createdBy on each entity before deletion to ensure you only remove what your refresh pipeline owns, leaving scan-managed or ADF-managed lineage intact.
Databricks-specific caveat
If you are on Unity Catalog, Purview reads lineage from the system.access schema during scans. Purging relationships via API and then running a scan will re-ingest that lineage from Databricks. Coordinate your purge and lineage push around your scan schedule, or scope the scan to exclude the affected catalogs during the refresh window.
For large lineage graphs, use the paginated traversal endpoint instead of increasing depth:
GET /datamap/api/atlas/v2/lineage/{guid}/next/

GuidovanDijk

MCT

May 30, 2026

Hi,

There is no system-wide lineage endpoint, but here is an efficient approach for Databricks.H H

Step 1: Enumerate assets with a qualifiedName prefix filter

Use the Search API scoped to your metastore rather than just filtering by typeName:

POST /datamap/api/atlas/v2/search/query { "keywords": null, "limit": 1000, "offset": 0, "filter": { "attributeName": "qualifiedName", "operator": "startswith", "value": "databricks://<your-metastore-id>" } }

Paginate with offset until all GUIDs are collected.

Step 2: Fetch lineage and target Process entities

GET /datamap/api/atlas/v2/lineage/{guid}?direction=BOTH&depth=3

The relationship IDs in the lineage response point back to Process entities (notebook runs, Spark jobs). For a refresh scenario, delete those Process entities directly rather than individual relationship GUIDs. Deleting a Process cascades its input/output relationships automatically and avoids leaving orphaned nodes behind.

DELETE /datamap/api/atlas/v2/entity/guid/{processEntityGuid}

Step 3: Filter by ownership before deleting

Check createdBy on each entity before deletion to ensure you only remove what your refresh pipeline owns, leaving scan-managed or ADF-managed lineage intact.

Databricks-specific caveat

If you are on Unity Catalog, Purview reads lineage from the system.access schema during scans. Purging relationships via API and then running a scan will re-ingest that lineage from Databricks. Coordinate your purge and lineage push around your scan schedule, or scope the scan to exclude the affected catalogs during the refresh window.

For large lineage graphs, use the paginated traversal endpoint instead of increasing depth:

GET /datamap/api/atlas/v2/lineage/{guid}/next/

southpawmurph
Copper Contributor
Jun 09, 2026
This approach works.

Forum Discussion

Data System Wide Lineage via API Request