This article describes how to identify the blobs with index tags and how to remove those tags using the Blob Inventory Service and Python SDK.
This article is divided into two sections. These sections are independent, which means that you can perform the steps in section 1 and not perform the steps in section 2, or vice versa:
Use the Blob Inventory service to identify the blobs with index tags
Remove the Blob Index Tags
Azure Storage blob inventory provides a list of the containers, blobs, blob versions, and snapshots in your storage account, along with their associated properties. It generates an output report in either comma-separated values (CSV) or Apache Parquet format on a daily or weekly basis. You can use the report to audit retention, legal hold or encryption status of your storage account contents, or you can use it to understand the total data size, age, tier distribution, or other attributes of your data. Please see more information here Azure Storage blob inventory.
The steps to enable inventory report are presented here Enable Azure Storage blob inventory reports.
Please see below how to define a blob inventory rule to search all blobs:
The blob inventory result will have the information as follows:
Support documentation:
In this section, you can find a script to remove the blob index tags from all the blobs under a specific container using Python SDK.
Please note that once these blob index tags are removed, they cannot be recovered. So, apply these steps only when you are sure that you no longer need to use your blob index tags.
Prerequisites
Download or use any Python IDE of your choice.
pip install azure-storage-blob
Sample scripts:
Special notes:
If you executed the Blob Inventory Report to identify the blobs with blob index tags (Section 1), you can use the script below (Script 1) to identify the containers with Blobs that have index tags. If you already know the name of the containers, please skip this script.
Script 1
# Please update the below parameter with your own information before executing this script:
# inventoryPath: The path to the blob inventory reprot file
import pandas as pd
inventoryPath = "C:\\XXX\\blobindextagsruleFILE.csv"
df = pd.read_csv(inventoryPath, sep = ",")
df['container'] = df['Name'].str.split('/').str[0]
df = df[df['TagCount'] > 0]
df = df['container'].drop_duplicates()
for i in df:
print(i)
After identifying the containers with Blobs with index tags, you can run the next script below (Script 2) to remove all index tags. We advise you to run the script once for each container. Please note that you can run several script instances in parallel.
Script 2
# Please update the below parameters with your own information before executing this script:
# account_name: Storage account name.
# account_key: Storage account key.
# container_name: Name of the container where the blobs with index tags are.
from azure.storage.blob import BlobServiceClient
from concurrent.futures import ThreadPoolExecutor
# Define your storage account name and key, and the container name
account_name = "XXX"
account_key = "XXX"
container_name = "XXX"
# Define the number of concurrent threads
concurrency = 250
# Count the number of blob with index tags
blob_count = 0
# Create a BlobServiceClient object
blob_service_client = BlobServiceClient(account_url=f"https://{account_name}.blob.core.windows.net", credential=account_key)
# Function to remove index tag from a blob
def remove_blob_index_tag(blob_name):
# Get the blob client
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Remove the index tag
blob_client.set_blob_tags(tags=None)
# Create a ThreadPoolExecutor with the specified concurrency
with ThreadPoolExecutor(max_workers=concurrency) as executor:
container_client = blob_service_client.get_container_client(container_name)
for blob in container_client.list_blobs():
# Get the blob client
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob.name)
blob_tags = blob_client.get_blob_tags()
# Check if index tag exists
if blob_tags:
futures = [executor.submit(remove_blob_index_tag, blob.name)]
blob_count += 1
print(f"This script removed index tags on {blob_count} blobs")
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.