Forum Discussion
Data System Wide Lineage via API Request
Hi,
There is no system-wide lineage endpoint, but here is an efficient approach for Databricks.H H
Step 1: Enumerate assets with a qualifiedName prefix filter
Use the Search API scoped to your metastore rather than just filtering by typeName:
POST /datamap/api/atlas/v2/search/query { "keywords": null, "limit": 1000, "offset": 0, "filter": { "attributeName": "qualifiedName", "operator": "startswith", "value": "databricks://<your-metastore-id>" } }
Paginate with offset until all GUIDs are collected.
Step 2: Fetch lineage and target Process entities
GET /datamap/api/atlas/v2/lineage/{guid}?direction=BOTH&depth=3
The relationship IDs in the lineage response point back to Process entities (notebook runs, Spark jobs). For a refresh scenario, delete those Process entities directly rather than individual relationship GUIDs. Deleting a Process cascades its input/output relationships automatically and avoids leaving orphaned nodes behind.
DELETE /datamap/api/atlas/v2/entity/guid/{processEntityGuid}
Step 3: Filter by ownership before deleting
Check createdBy on each entity before deletion to ensure you only remove what your refresh pipeline owns, leaving scan-managed or ADF-managed lineage intact.
Databricks-specific caveat
If you are on Unity Catalog, Purview reads lineage from the system.access schema during scans. Purging relationships via API and then running a scan will re-ingest that lineage from Databricks. Coordinate your purge and lineage push around your scan schedule, or scope the scan to exclude the affected catalogs during the refresh window.
For large lineage graphs, use the paginated traversal endpoint instead of increasing depth:
GET /datamap/api/atlas/v2/lineage/{guid}/next/