Forum Discussion

Kevin_H_'s avatar
Kevin_H_
Copper Contributor
Jul 12, 2024

Schematized source: when tables or columns are deleted in the source...?

I'm new to using Purview. I started using Data Map to scan in assets from an Azure SQL Database, and enrich those assets with additional metadata (descriptions, classifications, sensitivity labels). Then I intend to have many end users browsing the Data Catalog to discover data, with this additional metadata. 

 

I was concerned about potentially losing the "additional metadata" that I and my team manually enter into Purview. The schema of our Azure SQL Database does change over time, so I wondered: what happens if in the source database, some columns/tables are dropped or renamed, then we run the scan in Purview again? Would we lose any of the additional metadata we entered into Purview?

 

I did some quick testing, and it seems like "yes," in some cases we could. From my testing, it seems like if a column like `column_x` is dropped from an existing table in the source, or renamed to `column_y`; then an incremental Purview scan occurs; then, the new version of the table in the Purview Data Catalog no longer has `column_x`. It's gone. And whatever additional metadata I had entered for that column is lost. 

 

On the other hand, if a table like `table_x` is dropped in the source; then an incremental Purview scan occurs; the old table with the additional metadata we added is still there in Data Catalog. (And presumably, if `table_x` comes back into existence later, with the same columns, then my additional metadata would still be there and still be applicable. Though I didn't test this.)

 

Anyway, I am wondering how people handle this. Like maybe a column gets renamed in the source...but I don't want my scheduled Purview scans to cause me to lose the additional metadata we had entered for `column_x`! I want it to be retained, or at least I want to re-enter it, for the same column with its new name `column_y`. Copy and paste it. Do you ever back up your additional metadata, by exporting assets and their metadata? (Is this even possible? I think I read you could do this.) Any other solutions? 

  • Kevin_H_'s avatar
    Kevin_H_
    Copper Contributor

    Oh, I found this page, which I hadn't seen yet; it has a section about "How scans detect deleted assets"...

    https://learn.microsoft.com/en-us/purview/concept-scans-and-ingestion

     

    It actually seems like the intent is that an asset (database table) deleted in the source will be deleted from the Data Catalog. But maybe not right away, because "Deleted assets might continue to be visible in your catalog for some time before they are processed and removed." So it seems after more scans and more days, I should expect that my `table_x` will also disappear from Data Catalog, and the additional metadata we entered about it will be lost. 

     

    Which maybe isn't so bad; I can understand why that would be the intent. But again, it'd be nice to have a backup or something, in case I want to recover what was lost. But how?

  • Kevin_H_'s avatar
    Kevin_H_
    Copper Contributor
    Also, I tried exporting assets, to see if it is a way to back up my additional metadata. But it says I can only only export Business Assets, and not Physical Assets. That doesn't help. So, I don't see any solutions.

Resources