This article presents how to get the total number of blobs and the total capacity by storage account (or) by container (or) by directory using the Blob Inventory Service.
Table of Contents
Approach
This article presents how to take advantage of the Blob Inventory service to get the total blob count and the total capacity per storage account, per container or per directory.
I will present the steps to create the blob inventory rule and how to get the needed information without having to process the blob inventory rule results just by using the prefix match field.
Additional support documentation it is presented at the end of the article.
Introduction to the Blob Inventory Service
Azure Storage blob inventory provides a list of the containers, blobs, blob versions, and snapshots in your storage account, along with their associated properties. It generates an output report in either comma-separated values (csv) or Apache Parquet format on a daily or weekly basis. You can use the report to audit retention, legal hold or encryption status of your storage account contents, or you can use it to understand the total data size, age, tier distribution, or other attributes of your data. Please find here our documentation about the blob inventory service.
On this article, I will focus on using this service to get the blob count and the capacity.
Steps to enable inventory report
Please see below how to define a blob inventory rule to get the intended information, using the Azure Portal:
- Sign in to the Azure portal to get started.
- Locate your storage account and display the account overview.
- Under Data management, select Blob inventory.
- Select Add your first inventory rule, if you do not have any rule defined, or select Add a rule, in case that you already have at least one rule defined.
- Add a new inventory rule by filling in the following fields:
- Rule name: The name of your blob inventory rule.
- Container: Container to store the result of the blob inventory rule execution.
- Object type to inventory: Select blob
- Blob types:
- Blob Storage: Select all (Block blobs, Page blobs, Append blobs).
- Data Lake Storage: Select all (Block blobs, Append blobs).
- Subtypes:
- Blob Storage: Select all (Include blob versions, Include snapshots, Include deleted blobs).
- Data Lake Storage: Select all (Include snapshots, Include deleted blobs).
- Blob inventory fields: Please find here all custom schema fields supported for blob inventory. In this scenario, we need to select at least the following fields:
- Blob Storage: Name, Creation-Time, ETag, Content-Length, Snapshot, VersionId, IsCurrentVersion, Deleted, RemainingRetentionDays.
- Data Lake Storage: Name, Creation-Time, ETag, Content-Length, Snapshot, DeletionId, Deleted, DeletedTime, RemainingRetentionDays.
- Inventory frequency: A blob inventory run is automatically scheduled every day when daily is chosen. Selecting weekly schedule will only trigger the inventory run on Sundays.
-
A daily execution will return results faster.
-
- Export format: The export format. Could be a csv file or a parquet file.
- Prefix match: Filter blobs by name or first letters. To find items in a specific container, enter the name of the container followed by a forward slash, then the blob name or first letters. For example, to show all blobs starting with “a”, type: “myContainer/a”.
- Here is the place to add the path where to start collecting the blob information.
The step 5.9 presented above (the prefix match field) it is the main point of this article.
Considering that we have a Storage Account with a container named work, and a directory named items, inside the container named work. Please see below how to configure the prefix match field to get the needed result:
- Leave it empty to get the information at the storage account level.
- Add the container name in the prefix match field to get the information at the container level.
- Put prefix match = work/
- Add the directory path in the prefix match field to get the information at the directory level.
- Put prefix match = work/items/
The blob inventory execution will generate a file named <ruleName>-manifest.json, please see more information about this file in the support documentation section. This file captures the rule definition provided by the user and the path to the inventory for that rule, and the information that we want without having to process the blob inventory rule files.
{
"destinationContainer" : "inventory-destination-container",
"endpoint" : "https://testaccount.blob.core.windows.net",
"files" : [
{
"blob" : "2021/05/26/13-25-36/Rule_1/Rule_1.csv",
"size" : 12710092
}
],
"inventoryCompletionTime" : "2021-05-26T13:35:56Z",
"inventoryStartTime" : "2021-05-26T13:25:36Z",
"ruleDefinition" : {
"filters" : {
"blobTypes" : [ "blockBlob" ],
"includeBlobVersions" : false,
"includeSnapshots" : false,
"prefixMatch" : [ "penner-test-container-100003" ]
},
"format" : "csv",
"objectType" : "blob",
"schedule" : "daily",
"schemaFields" : [
"Name",
"Creation-Time",
"BlobType",
"Content-Length",
"LastAccessTime",
"Last-Modified",
"Metadata",
"AccessTier"
]
},
"ruleName" : "Rule_1",
"status" : "Succeeded",
"summary" : {
"objectCount" : 110000,
"totalObjectSize" : 23789775
},
"version" : "1.0"
}
The objectCount value is the total blob count, and the totalObjectSize is the total capacity in bytes.
Special notes:
- A rule needs to be defined for each path (container or directory) to get the total blob count and the total capacity.
- The blob inventory rule generates a CSV or Apache Parquet formatted file(s). These files should be deleted if the blob inventory rule is only to get the information presented on this article.
Support Documentation
| Topic | Some highlights |
|
The steps to enable inventory report. | |
|
If you configure a rule to run daily, then it will be scheduled to run every day. If you configure a rule to run weekly, then it will be scheduled to run each week on Sunday UTC time. The time taken to generate an inventory report depends on various factors and the maximum amount of time that an inventory run can complete before it fails is six days. | |
|
Each inventory rule generates a set of files in the specified inventory destination container for that rule. The inventory output is generated under the following path: https://<accountName>.blob.core.windows.net/<inventory-destination-container>/YYYY/MM/DD/HH-MM-SS/<ruleName where:
| |
|
Each inventory run for a rule generates the following files:
| |
| Pricing and billing | Pricing for inventory is based on the number of blobs and containers that are scanned during the billing period. |
| Known Issues and Limitations | This section describes limitations and known issues of the Azure Storage blob inventory feature. |
Disclaimer
- These steps are provided for the purpose of illustration only.
- These steps and any related information are provided "as is" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.
- We grant You a nonexclusive, royalty-free right to use and modify the Steps and to reproduce and distribute the steps, provided that. You agree:
- to not use Our name, logo, or trademarks to market Your software product in which the steps are embedded;
- to include a valid copyright notice on Your software product in which the steps are embedded; and
- to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of steps.