delta tables
3 TopicsPurview Integration with MS Fabric (Scanner)
Hi Everyone, Facing issue in scanning Lakehouse Delta tables in Purview. When I scan the Fabric workspace in Purview, I could able to see only Pipelines and Notebooks present in that workspace it doesn't identified Lakehouse tables as an asset. Prerequisites also done : Purview MSI granted with Contributor role access to that workspace in Fabric and Enabled Fabric Tenant level settings with Specific Security group (where Purview MSI as a member). Please help me on this, how to identify Lakehouse tables as an asset and extract Metadata details of lakehouse tables in Purview. It will help me proceed with Data cataloging. Thanks in advance. Regards, BanuMurali192Views1like1CommentData archiving of delta table in Azure Databricks
Hi all, Currently I am researching on data archiving for delta table data on Azure platform as there is data retention policy within the company. I have studied the documentation from Databricks official (https://docs.databricks.com/en/optimizations/archive-delta.html) which is about archival support in Databricks. It said "If you enable this setting without having lifecycle policies set for your cloud object storage, Databricks still ignores files based on this specified threshold, but no data is archived." Therefore, I am thinking how to configure the lifecycle policy in azure storage account. I have read the documentation on Microsoft official (https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview) Let say the delta table data are stored in "test-container/sales" and there are lots of "part-xxxx.snappy.parquet" data file stored in that folder. Should I simply specify "tierToArchive", "daysAfterCreationGreaterThan: 1825", "prefixMatch: ["test-container/sales"]? However, I am worried that will this archive mechanism impact on normal delta table operation? Besides, I am worried that what if the parquet data file moved to archive tier contains both data created before 5 years and after 5 years, it is possible? Will it by chance move data earlier to archive tier before 5 years? Highly appreciate if someone could help me out with the questions above. Thanks in advance.293Views0likes1CommentUnable to load large delta table in azure ml studio
I am writing to report an issue that I am currently experiencing while trying to read a delta table from Azure ML. I have already created data assets to register the delta table, which is located at an ADLS location. However, when attempting to load the data, I have noticed that for large data sizes it is taking an exceedingly long time to load. I have confirmed that for small data sizes, the data is returned within few seconds, which leads me to believe that there may be an issue with the scalability of the data loading process. I would greatly appreciate it if you could investigate this issue and provide me with any recommendations or solutions to resolve this issue. I can provide additional details such as the size of the data, the steps I am taking to load the data, and any error messages if required. I'm following this document: https://learn.microsoft.com/en-us/python/api/mltable/mltable.mltable?view=azure-ml-py#mltable-mltable-from-delta-lake Using this command to read delta table using data asset URI from mltable import from_delta_lake mltable_ts = from_delta_lake(delta_table_uri=<DATA ASSET URI>, timestamp_as_of="2999-08-26T00:00:00Z", include_path_column=True)587Views0likes0Comments