In managing data within Azure blob storage accounts and Azure data lake gen 2 storage accounts, organizations often encounter scenarios where blobs have been deleted but remain in a soft-deleted state. To calculate the remaining retention days for all such blobs across an entire storage account can be a critical requirement for customers seeking to optimize data management and ensure compliance with retention policies.
Additionally, certain blobs may have an expiry time set, scheduling their deletion for a future date. To facilitate the identification and monitoring of these blobs and their respective expiry times, a custom query has been written to efficiently list and calculate expiry information, enabling users to proactively manage their storage resources.
The expiry time for Azure blobs is set using the Set Blob Expiry operation. This feature is present in only Hierarchical namespace enabled storage accounts.
We can set the expiry with below steps:
i) Azure Storage action- About Azure Storage Actions - Azure Storage Actions | Microsoft Learn
Storage action can be used to set blob expiry, share a high-level snippet for the operation below
ii) REST API- Set Blob Expiry (REST API) - Azure Storage | Microsoft Learn to set the expiry time for your blobs. This ensures that each blob has a defined lifecycle and will be deleted after the specified period.
In this blog, it is a step-by-step process of listing the expiry time and retention of the blobs using Blob Inventory report and then parsing it using Synapse.
1. Set blob inventory rule
Get CSV file from blob inventory run.
Go to the container where inventory reports are getting stored.
Navigate to the recent date folder and get url of Blob Inventory csv life.
Sharing the below snippet for reference:
2. Create an Azure Synapse workspace
Next, create an Azure Synapse workspace where you will execute a SQL query to report the inventory results.
Create the SQL query: After you create your Azure Synapse workspace, do the following steps.
- Navigate to https://web.azuresynapse.net.
- Select the Develop tab on the left edge.
- Select the large plus sign (+) to add an item.
- Select SQL script.
3. Use the sample query below to get the expiry time and remaining retention days of blob respectively
select
LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container,
RIGHT([Name], LEN([Name])- CHARINDEX('/',[Name])) AS Blob,
[Expiry-time]
from OPENROWSET(
bulk '<URL to your inventory CSV file>',
format='csv', parser_version='2.0', header_row=true
) as Source
For blobs which got deleted directly, you can calculate the remaining retention days since the data is present in soft deleted state and will be deleted permanently after the retention days complete.
select
LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container,
RIGHT([Name], LEN([Name])- CHARINDEX('/',[Name])) AS Blob,
[Expiry-time],
RemainingRetentionDays
from OPENROWSET(
bulk '<URL to your inventory CSV file>',
format='csv', parser_version='2.0', header_row=true
) as Source
In the above snippet, the Null value represents that the blob is not deleted and no expiry time is set on the blob yet.
Please Note: Calculating blob expiry from blob inventory is one way, customer can explore other options such as Powershell and Azure CLI to achieve the same.
Reference links:-
Set Blob Expiry (REST API) - Azure Storage | Microsoft Learn
Create a storage task - Azure Storage Actions | Microsoft Learn
Azure Storage blob inventory | Microsoft Learn
Calculate blob count and size using Azure Storage inventory | Microsoft Learn