The blog points at performing/analyzing capacity of your storage account to understand which service is consuming the capacity. We will be primarily discussing about the Blob storage service. This shall help in what are the different available metrics you can leverage, features that you might need to look for that may can leading to some extra capacity being consumed in your storage account and what measures can be taken to identity that.
Identify/understand the size of the size of the storage account and different storage service
The maximum capacity for the storage account is defined by scalability target documentation. However, how much capacity your storage account is consuming is you can check by looking at the Metrics under the Monitoring section. You need to select Account as the Metric Namespace and Used Capacity as the metrics. This shall provide you the capacity of your storage account.
You can further narrow it down in case you to check for specific service by selecting the Metric Name space as the storage service for which you want to check - Blob, File, Queue or Table. Depending upon your selection of namespace, it shall provide further option under the Metric dropdown e.g., for blob service, it shall provide you options such as Blob Capacity, Blob Container Count, Blob Count.
Based on the options available for filtering and splitting, you can further narrow down your investigation e.g., for blob service you can drill it based on blob tier or blob type. There is no such breakdown available for Queue/Table capacity. Below is an example of getting understanding of blob capacity by splitting is based on blob type.
Another option is to check the Insights option under the monitoring section and thereafter navigating the Capacity tab. This section also provides a breakup of account capacity and shows how much the individual service is consuming within the storage account. A graph on the same page shall provide you counts of storage units as well such number of blobs.
Analyzing the Blob Storage Capacity
Till above part, you will be getting a high-level idea regarding how much is your blob storage. For further analysis, you can start by taking a look at the below article. It has reference to different options such as Azure Storage Explorer, Power shell script, CLI script etc that you can explore as per your feasibility.
You can leverage the Folder Statistics which is the simplest one and shall help you calculating the size of storage container and count the number of blobs just on a simple click. It can be little challenging in case you have more nested structure and want to get details at one shot. You can then make use of below Power shell scripts that shall with similar asks of calculating the capacity on account/container level. Based on your use case, you can add customization to the scripts ahead:
If calculating size of individual storage containers and summing them up is not accumulating to the total size of your blob storage, let’s what are the next steps that can be followed.
Verifying if data protection features are enabled
In case there are data protection features such as soft delete for blobs, blob versioning, etc. are enabled on the storage account. These features do provide and extra level of data protection and shall help you in recovering of the data in case of any un-intentional delete operation depending upon the retention period you opt while configuring them. It is important to understand the features thoroughly and let see how this can add to capacity.
Whenever you perform delete operation over the blobs and if the soft delete option is enabled, the blobs will continue to reside in the storage account until the actual expiry time of the blob is met. You can get an idea of the actual expiry time by help of Deleted Time and RemainingRetentionDays (Days until permanent delete). Deleted Time signifies the time when the delete operation was performed over the blob whereas the RemainingRetentionDays signifies the number of days after which soft deleted blob will be permanently deleted. For example: Consider you have a storage account with soft delete for blobs enabled with a retention period of 7 days. Now, let’s say you have deleted the blobs around 5th Dec at 01:00 PM UTC. In this scenario, the blobs will continue to reside in the storage account approximately till on 17th Dec at 01:00 PM UTC (i.e. until respective blobs expires). These soft deleted blobs will add to the capacity. Thereafter, the garbage collector then collects all the blobs marked for deletion and clean them up. Please note that there may be some delay in garbage collector to catch up on the blobs marked for deletion. That’s also the reason why you may not see immediate drop in capacity during scenario of multi-million deletes.
Similarly, if versioning is enabled, it can lead to different version being created for blob in case the blobs is modified or deleted. Versioning along with soft delete feature enabled will have different affects.
Clean up of storage space being occupied due to enablement of data protection feature
Now, to identify if there are any soft deleted blobs or versioned blobs, you can have them checked by following the below link:
Once you have verified that there is capacity being consumed via soft deleted blobs, the question arises how you can clear them and gain on the capacity again.
To clear any soft deleted blobs, below are the steps to follow:
Turn off soft delete. This is required if using the Delete Blob API because the Delete call is always soft delete when they have the feature enabled.
Iterate the blob container you want to clean up and undelete the soft deleted blobs and snapshots.
Iterate the blob container and delete the blobs and snapshots you want to permanently delete.
Re-enable soft delete (in case you want to enable it again with appropriate retention period).
Please noted that time you have turn off the feature and in case anything gets deleted, the recovery is only possible on best effort basis. You will have to raise a support ticket for the same. Below is the link to the powershell script that you can leverage to clean up the same.
It is recommended to perform thorough testing of the above rectification options before implementing them in your production environment.
Overall, reviewing of the above metrics and checking above suggested options, you should be able to drill down, analyze and understand which service is consuming the capacity in your environment. You can then check for measures further accordingly.