azure storage
122 TopicsSet Up Endpoint DLP Evidence Collection on your Azure Blob Storage
Endpoint Data Loss Prevention (Endpoint DLP) is part of the Microsoft Purview Data Loss Prevention (DLP) suite of features you can use to discover and protect sensitive items across Microsoft 365 services. Microsoft Endpoint DLP allows you to detect and protect sensitive content across onboarded Windows 10, Windows 11 and macOS devices. Learn more about all of Microsoft's DLP offerings. Before you start setting up the storage, you should review Get started with collecting files that match data loss prevention policies from devices | Microsoft Learn to understand the licensing, permissions, device onboarding and your requirements. Prerequisites Before you begin, ensure the following prerequisites are met: You have an active Azure subscription. You have the necessary permissions to create and configure resources in Azure. You have setup endpoint Data Loss Prevention policy on your devices Configure the Azure Blob Storage You can follow these steps to create an Azure Blob Storage using the Azure portal. For other methods refer to Create a storage account - Azure Storage | Microsoft Learn Sign in to the Azure Storage Accounts with your account credentials. Click on + Create On the Basics tab, provide the essential information for your storage account. After you complete the Basics tab, you can choose to further customize your new storage account, or you accept the default options and proceed. Learn more about azure storage account properties Once you have provided all the information click on the Networking tab. In network access, select Enable public access from all networks while creating the storage account. Click on Review + create to validate the settings. Once the validation passes, click on Create to create the storage Wait for deployment of the resource to be completed and then click on Go to resource. Once the newly created Blob Storage is opened, on the left panel click on Data Storage -> Containers Click on + Containers. Provide the name and other details and then click on Create Once your container is successfully created, click on it. Assign relevant permissions to the Azure Blob Storage Once the container is created, using Microsoft Entra authorization, you must configure two sets of permissions (role groups) on it: One for the administrators and investigators so they can view and manage evidence One for users who need to upload items to Azure from their devices Best practice is to enforce least privilege for all users, regardless of role. By enforcing least privilege, you ensure that user permissions are limited to only those permissions necessary for their role. We will use portal to create these custom roles. Learn more about custom roles in Azure RBAC Open the container and in the left panel click on Access Control (IAM) Click on the Roles tab. It will open a list of all available roles. Open context menu of Owner role using ellipsis button (…) and click on Clone. Now you can create a custom role. Click on Start from scratch. We have to create two new custom roles. Based on the role you are creating enter basic details like name and description and then click on JSON tab. JSON tab gives you the details of the custom role including the permissions added to that role. For owner role JSON looks like this: Now edit these permissions and replace them with permissions required based on the role: Investigator Role: Copy the permissions available at Permissions on Azure blob for administrators and investigators and paste it in the JSON section. User Role: Copy the permissions available at Permissions on Azure blob for usersand paste it in the JSON section. Once you have created these two new roles, we will assign these roles to relevant users. Click on Role Assignments tab, then on Add + and on Add role assignment. Search for the role and click on it. Then click on Members tab Click on + Select Members. Add the users or user groups you want to add for that role and click on Select Investigator role – Assign this role to users who are administrators and investigators so they can view and manage evidence User role – Assign this role to users who will be under the scope of the DLP policy and from whose devices items will be uploaded to the storage Once you have added the users click on Review+Assign to save the changes. Now we can add this storage to DLP policy. For more information on configuring the Azure Blob Storage access, refer to these articles: How to authorize access to blob data in the Azure portal Assign share-level permissions. Configure storage in your DLP policy Once you have configured the required permissions on the Azure Blob Storage, we will add the storage to DLP endpoint settings. Learn more about configuring DLP policy Open the storage you want to use. In left panel click on Data Storage -> Containers. Then select the container you want to add to DLP settings. Click on the Context Menu (… button) and then Container Properties. Copy the URL Open the Data Loss Prevention Settings. Click on Endpoint Settings and then on Setup evidence collection for file activities on devices. Select Customer Managed Storage option and then click on Add Storage Give the storage name and copy the container URL we copied. Then click on Save. Storage will be added to the list. Storage will be added to the list for use in the policy configuration. You can add up to 10 URLs Now open the DLP endpoint policy configuration for which you want to collect the evidence. Configure your policy using these settings: Make sure that Devices is selected in the location. In Incident reports, toggle Send an alert to admins when a rule match occurs to On. In Incident reports, select Collect original file as evidence for all selected file activities on Endpoint. Select the storage account you want to collect the evidence in for that rule using the dropdown menu. The dropdown menu shows the list of storages configured in the endpoint DLP settings. Select the activities for which you want to copy matched items to Azure storage Save the changes Please reach out to the support team if you face any issues. We hope this guide is helpful and we look forward to your feedback. Thank you, Microsoft Purview Data Loss Prevention Team4KViews6likes2CommentsHow to get blob Total Blob Count and Total Capacity with Blob Inventory
Approach This article presents how to take advantage of the Blob Inventory service to get the total blob count and the total capacity per storage account, per container or per directory. I will present the steps to create the blob inventory rule and how to get the needed information without having to process the blob inventory rule results just by using the prefix match field. Additional support documentation it is presented at the end of the article. Introduction to the Blob Inventory Service Azure Storage blob inventory provides a list of the containers, blobs, blob versions, and snapshots in your storage account, along with their associated properties. It generates an output report in either comma-separated values (csv) or Apache Parquet format on a daily or weekly basis. You can use the report to audit retention, legal hold or encryption status of your storage account contents, or you can use it to understand the total data size, age, tier distribution, or other attributes of your data. Please find here our documentation about the blob inventory service. On this article, I will focus on using this service to get the blob count and the capacity. Steps to enable inventory report Please see below how to define a blob inventory rule to get the intended information, using the Azure Portal: Sign in to the Azure portal to get started. Locate your storage account and display the account overview. Under Data management, select Blob inventory. Select Add your first inventory rule, if you do not have any rule defined, or select Add a rule, in case that you already have at least one rule defined. Add a new inventory rule by filling in the following fields: Rule name: The name of your blob inventory rule. Container: Container to store the result of the blob inventory rule execution. Object type to inventory: Select blob Blob types: Blob Storage: Select all (Block blobs, Page blobs, Append blobs). Data Lake Storage: Select all (Block blobs, Append blobs). Subtypes: Blob Storage: Select all (Include blob versions, Include snapshots, Include deleted blobs). Data Lake Storage: Select all (Include snapshots, Include deleted blobs). Blob inventory fields: Please find here all custom schema fields supported for blob inventory. In this scenario, we need to select at least the following fields: Blob Storage: Name, Creation-Time, ETag, Content-Length, Snapshot, VersionId, IsCurrentVersion, Deleted, RemainingRetentionDays. Data Lake Storage: Name, Creation-Time, ETag, Content-Length, Snapshot, DeletionId, Deleted, DeletedTime, RemainingRetentionDays. Inventory frequency: A blob inventory run is automatically scheduled every day when daily is chosen. Selecting weekly schedule will only trigger the inventory run on Sundays. A daily execution will return results faster. Export format: The export format. Could be a csv file or a parquet file. Prefix match: Filter blobs by name or first letters. To find items in a specific container, enter the name of the container followed by a forward slash, then the blob name or first letters. For example, to show all blobs starting with “a”, type: “myContainer/a”. Here is the place to add the path where to start collecting the blob information. The step 5.9 presented above (the prefix match field) it is the main point of this article. Considering that we have a Storage Account with a container named work, and a directory named items, inside the container named work. Please see below how to configure the prefix match field to get the needed result: Leave it empty to get the information at the storage account level. Add the container name in the prefix match field to get the information at the container level. Put prefix match = work/ Add the directory path in the prefix match field to get the information at the directory level. Put prefix match = work/items/ The blob inventory execution will generate a file named <ruleName>-manifest.json, please see more information about this file in the support documentation section. This file captures the rule definition provided by the user and the path to the inventory for that rule, and the information that we want without having to process the blob inventory rule files. { "destinationContainer" : "inventory-destination-container", "endpoint" : "https://testaccount.blob.core.windows.net", "files" : [ { "blob" : "2021/05/26/13-25-36/Rule_1/Rule_1.csv", "size" : 12710092 } ], "inventoryCompletionTime" : "2021-05-26T13:35:56Z", "inventoryStartTime" : "2021-05-26T13:25:36Z", "ruleDefinition" : { "filters" : { "blobTypes" : [ "blockBlob" ], "includeBlobVersions" : false, "includeSnapshots" : false, "prefixMatch" : [ "penner-test-container-100003" ] }, "format" : "csv", "objectType" : "blob", "schedule" : "daily", "schemaFields" : [ "Name", "Creation-Time", "BlobType", "Content-Length", "LastAccessTime", "Last-Modified", "Metadata", "AccessTier" ] }, "ruleName" : "Rule_1", "status" : "Succeeded", "summary" : { "objectCount" : 110000, "totalObjectSize" : 23789775 }, "version" : "1.0" } The objectCount value is the total blob count, and the totalObjectSize is the total capacity in bytes. Special notes: A rule needs to be defined for each path (container or directory) to get the total blob count and the total capacity. The blob inventory rule generates a CSV or Apache Parquet formatted file(s). These files should be deleted if the blob inventory rule is only to get the information presented on this article. Support Documentation Topic Some highlights Enable Azure Storage blob inventory reports The steps to enable inventory report. Inventory run If you configure a rule to run daily, then it will be scheduled to run every day. If you configure a rule to run weekly, then it will be scheduled to run each week on Sunday UTC time. The time taken to generate an inventory report depends on various factors and the maximum amount of time that an inventory run can complete before it fails is six days. Inventory output Each inventory rule generates a set of files in the specified inventory destination container for that rule. The inventory output is generated under the following path: https://<accountName>.blob.core.windows.net/<inventory-destination-container>/YYYY/MM/DD/HH-MM-SS/<ruleName where: accountName is your Azure Blob Storage account name. inventory-destination-container is the destination container you specified in the inventory rule. YYYY/MM/DD/HH-MM-SS is the time when the inventory began to run. ruleName is the inventory rule name. Inventory files Each inventory run for a rule generates the following files: Inventory file: An inventory run for a rule generates a CSV or Apache Parquet formatted file. Each such file contains matched objects and their metadata. Checksum file: A checksum file contains the MD5 checksum of the contents of manifest.json file. The name of the checksum file is <ruleName>-manifest.checksum. Generation of the checksum file marks the completion of an inventory rule run Manifest file: A manifest.json file contains the details of the inventory file(s) generated for that rule. The name of the file is <ruleName>-manifest.json. This file also captures the rule definition provided by the user and the path to the inventory for that rule. Pricing and billing Pricing for inventory is based on the number of blobs and containers that are scanned during the billing period. Known Issues and Limitations This section describes limitations and known issues of the Azure Storage blob inventory feature. Disclaimer These steps are provided for the purpose of illustration only. These steps and any related information are provided "as is" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. We grant You a nonexclusive, royalty-free right to use and modify the Steps and to reproduce and distribute the steps, provided that. You agree: to not use Our name, logo, or trademarks to market Your software product in which the steps are embedded; to include a valid copyright notice on Your software product in which the steps are embedded; and to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of steps.352Views2likes1CommentRemove Unnecessary Azure Storage Account Dependencies in VM Diagnostics
This post explains how to reduce unnecessary Azure Storage Account dependencies—and associated SAS token usage—by simplifying VM diagnostics configurations: specifically, by removing the retiring legacy IaaS Diagnostics extension and migrating VM boot diagnostics from customer-managed Storage Accounts to Microsoft‑managed storage. Using Azure Resource Graph to identify affected virtual machines at scale, the article shows that both changes can be implemented without VM reboots or guest OS impact, reduce storage sprawl and operational overhead, and help organizations stay ahead of platform deprecations, with automation options available to standardize these improvements across environments.SSMS 21/22 Error Upload BACPAC file to Azure Storage
Hello All In my SSMS 20, I can use "Export Data-tier Application" to export an BACPAC file of Azure SQL database and upload to Azure storage in the same machine, the SSMS 21 gives error message when doing the same export, it created the BACPAC files but failed on the last step, "Uploading BACPAC file to Microsoft Azure Storage", The error message is "Could not load file or assembly 'System.IO.Hashing, Version=6.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' or one of its dependencies. The system cannot find the file specified. (Azure.Storage.Blobs)" I tried the fresh installation of SSMS 21 in a brand-new machine (Windows 11), same issue, Can anyone advice? Thanks441Views0likes5Comments[Design Pattern] Handling race conditions and state in serverless data pipelines
Hello community, I recently faced a tricky data engineering challenge involving a lot of Parquet files (about 2 million records) that needed to be ingested, transformed, and split into different entities. The hard part wasn't the volume, but the logic. We needed to generate globally unique, sequential IDs for specific columns while keeping the execution time under two hours. We were restricted to using only Azure Functions, ADF, and Storage. This created a conflict: we needed parallel processing to meet the time limit, but parallel processing usually breaks sequential ID generation due to race conditions on the counters. I documented the three architecture patterns we tested to solve this: Sequential processing with ADF (Safe, but failed the 2-hour time limit). 2. Parallel processing with external locking/e-tags on Table Storage (Too complex and we still hit issues with inserts). 3. A "Fan-Out/Fan-In" pattern using Azure Durable Functions and Durable Entities. We ended up going with Durable Entities. Since they act as stateful actors, they allowed us to handle the ID counter state sequentially in memory while the heavy lifting (transformation) ran in parallel. It solved the race condition issue without killing performance. I wrote a detailed breakdown of the logic and trade-offs here if anyone is interested in the implementation details: https://medium.com/@yahiachames/data-ingestion-pipeline-a-data-engineers-dilemma-and-azure-solutions-7c4b36f11351 I am curious if others have used Durable Entities for this kind of ETL work, or if you usually rely on an external database sequence to handle ID generation in serverless setups? Thanks, Chameseddine168Views0likes1CommentAzure SQL Database : Can I use same primary key column and foreign key column for multiple tables?
CREATE TABLE Table1( PRIMARY KEY (Table1ID), Column2 int ); CREATE TABLE Table2( PRIMARY KEY (Table1ID), Column2 int, FOREIGN KEY (Table1ID) REFERENCES Table1(Table1ID) ); CREATE TABLE Table3( PRIMARY KEY (Table1ID), Column2 int, FOREIGN KEY (Table1ID) REFERENCES Table1(Table1ID) );350Views0likes1CommentAzure Logic App workflow (Standard) Resubmit and Retry
Hello Experts, A workflow is scheduled to run daily at a specific time and retrieves data from different systems using REST API Calls (8-9). The data is then sent to another system through API calls using multiple child flows. We receive more than 1500 input data, and for each data, an API call needs to be made. During the API invocation process, there is a possibility of failure due to server errors (5xx) and client errors (4xx). To handle this, we have implemented a "Retry" mechanism with a fixed interval. However, there is still a chance of flow failure due to various reasons. Although there is a "Resubmit" feature available at the action level, I cannot apply it in this case because we are using multiple child workflows and the response is sent back from one flow to another. Is it necessary to utilize the "Resubmit" functionality? The Retry Functionality has been developed to handle any Server API errors (5xx) that may occur with Connectors (both Custom and Standard), including client API errors 408 and 429. In this specific scenario, it is reasonable to attempt retrying or resubmitting the API Call from the Azure Logic Apps workflow. Nevertheless, there are other situations where implementing the retry and resubmit logic would result in the same error outcome. Is it acceptable to proceed with the Retry functionality in this particular scenario? It would be highly appreciated if you could provide guidance on the appropriate methodology. Thanks -Sri1.1KViews0likes1CommentDeriving expiry days and remaining retention days for blobs through blob inventory
In managing data within Azure blob storage accounts and Azure data lake gen 2 storage accounts, organizations often encounter scenarios where blobs have been deleted but remain in a soft-deleted state. To calculate the remaining retention days for all such blobs across an entire storage account can be a critical requirement for customers seeking to optimize data management and ensure compliance with retention policies. Additionally, certain blobs may have an expiry time set, scheduling their deletion for a future date. To facilitate the identification and monitoring of these blobs and their respective expiry times, a custom query has been written to efficiently list and calculate expiry information, enabling users to proactively manage their storage resources. The expiry time for Azure blobs is set using the Set Blob Expiry operation. This feature is present in only Hierarchical namespace enabled storage accounts. We can set the expiry with below steps: i) Azure Storage action- About Azure Storage Actions - Azure Storage Actions | Microsoft Learn Storage action can be used to set blob expiry, share a high-level snippet for the operation below ii) REST API- Set Blob Expiry (REST API) - Azure Storage | Microsoft Learn to set the expiry time for your blobs. This ensures that each blob has a defined lifecycle and will be deleted after the specified period. In this blog, it is a step-by-step process of listing the expiry time and retention of the blobs using Blob Inventory report and then parsing it using Synapse. 1. Set blob inventory rule Get CSV file from blob inventory run. Go to the container where inventory reports are getting stored. Navigate to the recent date folder and get url of Blob Inventory csv life. Sharing the below snippet for reference: 2. Create an Azure Synapse workspace Next, create an Azure Synapse workspace where you will execute a SQL query to report the inventory results. Create the SQL query: After you create your Azure Synapse workspace, do the following steps. Navigate to https://web.azuresynapse.net. Select the Develop tab on the left edge. Select the large plus sign (+) to add an item. Select SQL script. 3. Use the sample query below to get the expiry time and remaining retention days of blob respectively select LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container, RIGHT([Name], LEN([Name])- CHARINDEX('/',[Name])) AS Blob, [Expiry-time] from OPENROWSET( bulk '<URL to your inventory CSV file>', format='csv', parser_version='2.0', header_row=true ) as Source For blobs which got deleted directly, you can calculate the remaining retention days since the data is present in soft deleted state and will be deleted permanently after the retention days complete. select LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container, RIGHT([Name], LEN([Name])- CHARINDEX('/',[Name])) AS Blob, [Expiry-time], RemainingRetentionDays from OPENROWSET( bulk '<URL to your inventory CSV file>', format='csv', parser_version='2.0', header_row=true ) as Source In the above snippet, the Null value represents that the blob is not deleted and no expiry time is set on the blob yet. Please Note: Calculating blob expiry from blob inventory is one way, customer can explore other options such as Powershell and Azure CLI to achieve the same. Reference links:- Set Blob Expiry (REST API) - Azure Storage | Microsoft Learn Create a storage task - Azure Storage Actions | Microsoft Learn Azure Storage blob inventory | Microsoft Learn Calculate blob count and size using Azure Storage inventory | Microsoft Learn407Views0likes0CommentsHow to configure directory level permission for SFTP local user
SFTP is a feature which is supported for Azure Blob Storage with hierarchical namespace (ADLS Gen2 Storage Account). As documented, the permission system used by SFTP feature is different from normal permission system in Azure Storage Account. It’s using a form of identity management called local users. Normally the permission which user can set up on local users while creating them is on container level. But in real user case, it’s usual that user needs to configure multiple local users, and each local user only has permission on one specific directory. In this scenario, using ACLs (Access control lists) for local users will be a great solution. In this blog, we’ll set up an environment using ACLs for local users and see how it meets the above aim. Attention! As mentioned in Caution part of the document, the ACLs for local users are supported, but also still in preview. Please do not use this for your production environment. Preparation Before configuring local users and ACLs, the following things are already prepared: One ADLS Gen2 Storage Account. (In this example, it’s called zhangjerryadlsgen2) A container (testsftp) with two directories. (dir1 and dir2) One file uploaded into each directory. (test1.txt and test2.txt) The file system in this blog is like: Aim The aim is to have user1 which can only list files saved in dir1 and user2 which can only list files saved in dir2. Both of them should be unable to do any other operations in the matching directory (dir1 for user1 and dir2 for user2) and should be unable to do any operations in root directory and the other directory. Configuring local users From Azure Portal, it’s easy to enable SFTP feature and create local users. Here except user1 and user2, another additional user is also necessary. It will be used as the administrator to assign ACLs on user1 and user2. In this blog, it’s called admin. While creating the admin, its landing directory should be the root directory of the container and the permissions should be all given. While creating the user1 and user2, as the permission will be controlled by using ACLs, the containers and permissions should be left empty and the Allow ACL authorization should be checked. The landing directory should be configured to the directory which this user should have permission later. (In this blog, user1 should be on dir1 and user2 should be on dir2.) User1: User2: After local users are created, one more step which is needed before configuring ACL is to note down the user ID of user1 and user2. By clicking the created local user, a page as following to edit local user should show out and the user ID will be included there. In this blog, the user ID of user1 is 1002 and user ID of user2 is 1003. Configuring ACLs Before starting configuring ACLs, clarifying which permissions to assign is necessary. As explained in this document, the ACLs contains three different permissions: Read(R), Write(W) and Execute(X). And from the “Common scenarios related to ACL permissions” part of the same document, there is a table which contains most operations and their corresponding required permissions. Since the aim of this blog is to allow user1 only to list the dir1, according to table, we know that correct permission for user1 should be X on root directory, R and X on dir1. (For user2, it’s X on root directory, R and X on dir2). After clarifying the needed permissions, the next step is to assign ACLs. The first step is to connect to the Storage Account using SFTP as admin: (In this blog, the PowerShell session + OpenSSL is used but it’s not the only way. Users can also use any other way to build SFTP connection to the Storage Account.) Since assigning ACLs for local users is not possible to a specific user, and the owner of root directory is a built-in user which is controlled by Azure, the easiest way here is to give X permissions to all other users. (For concept of other users, please refer to this document) Next step is to assign R and X permission. But considering the same reason, it’s impossible to give R and X permissions for all other users again. Because if it’s done, user1 will also have R and X permissions on dir2, which does not match the aim. The best way here is to change the owner of the directory. Here we should change the owner of dir1 to user1 and dir2 to user2. (By this way, user1 will not have permission to touch dir2.) After above configurations, while connecting to the Storage Account by SFTP connection using user1 and user2, only listing file operation under corresponding directory is allowed. User1: User2: (The following test result proves that only list operation under /dir2 is allowed. All other operations will return permission denied or not found error.) About landing directory What will happen if all other configurations are correct but the landing directory is configured as root directory for user1 or user2? The answer to the above question is quite simple: The configuration will still work, but will impact the user experience. To show the the result of that case, one more local user called user3 with user ID 1005 is created but its landing directory is configured as admin, which is on root directory. The ACL permission assigned on it is same as user2 (change owner of dir2 to user3.) While connecting to the Storage Account by SFTP using user3, it will be landing on root directory. But per ACLs configuration, it only has permission to list files in dir2, hence the operations in root directory and dir1 are expected to fail. To apply further operation, user needs to add dir2/ in the command or cd dir2 at first.3.3KViews1like1Comment