azure blob
48 TopicsBuilding a Scalable Web Crawling and Indexing Pipeline with Azure storage and AI Search
In the ever-evolving world of data management, keeping search indexes up-to-date with dynamic data can be challenging. Traditional approaches, such as manual or scheduled indexing, are resource-intensive, delay-prone, and difficult to scale. Azure Blob Trigger combined with an AI Search Indexer offers a cutting-edge solution to overcome these challenges, enabling real-time, scalable, and enriched data indexing. This blog explores how Blob Trigger, integrated with Azure Cognitive Search, transforms the indexing process by automating workflows and enriching data with AI capabilities. It highlights the step-by-step process of configuring Blob Storage, creating Azure Functions for triggers, and seamlessly connecting with an AI-powered search index. The approach leverages Azure's event-driven architecture, ensuring efficient and cost-effective data management.1.1KViews7likes10CommentsHolding forensic evidence: The role of hybrid cloud in successful preservation and compliance
Disclaimer: The following is a post authored by our partner Tiger Technology. Tiger Technology has been a valued partner in the Azure Storage ecosystem for many years and we are happy to have them share details on their innovative solution! Police departments worldwide are grappling with a digital explosion. From body camera footage to social media captures, the volume and variety of evidence have surged, creating a storage, and management challenge like never before. A single police department needing to store 2–5 petabytes of data—and keep some of it for 100 years. How can they preserve the integrity of this data, make it cost-effective, and ensure compliance with legal requirements? The answer lies in hybrid cloud solutions, specifically Microsoft Azure Blob Storage paired with Tiger Bridge. These solutions are empowering law enforcement to manage, and store evidence at scale, without disrupting workflows. But what exactly is hybrid cloud, and why is it a game-changer for digital evidence management? What is a hybrid cloud? A hybrid cloud combines public or private cloud services with on-premises infrastructure. It gives organizations the flexibility to mix, and match environments, allowing them to choose the best fit for specific applications, and data. This flexibility is especially valuable in highly regulated industries like law enforcement, where strict data privacy, and compliance rules govern how evidence is stored, processed, and accessed. Hybrid cloud also facilitates a smoother transition to public cloud solutions. For instance, when a data center reaches capacity, hybrid setups allow agencies to scale dynamically while maintaining control over their most sensitive data. It’s not just about storage—it's about creating a robust, compliant infrastructure for managing enormous volumes of evidence. What makes digital evidence so complex? Digital evidence encompasses any information stored or transmitted in binary form that can be used in court. It includes computer hard drives, phone records, social media posts, surveillance footage, etc. The challenge isn’t just collecting this data—it’s preserving its integrity. Forensic investigators must adhere to strict chain-of-custody protocols to prove in court that the evidence: Is authentic and unaltered, Has been securely stored with limited access, Is readily available when needed. With the surge in data volumes and complexity, traditional storage systems often fall short. That’s where hybrid cloud solutions shine, offering scalable, secure, and cost-effective options that keep digital evidence admissible while meeting compliance standards. The challenges police departments face Digital evidence is invaluable. Storing and managing it is a challenging task, and requires dealing with several aspects: Short-term storage problems The sheer scale of data can overwhelm local systems. Evidence must first be duplicated using forensic imaging to protect the original file. But housing these duplicates, especially with limited budgets, strains existing resources. Long-term retention demands In some jurisdictions, evidence must be retained for decades—sometimes up to a century. Physical storage media, like hard drives or SSDs, degrade over time and are expensive to maintain. Transitioning this data to cloud cold storage offers a more durable and cost-effective solution. Data integrity and legal admissibility Even the slightest suspicion of tampering can render evidence inadmissible. Courts require robust proof of authenticity and integrity, including cryptographic hashes and digital timestamps. Failing to maintain a clear chain of custody could jeopardize critical cases. Solving the storage puzzle with hybrid cloud For law enforcement agencies, managing sensitive evidence isn't just about storage—it's about creating a system that safeguards data integrity, ensures compliance, and keeps costs under control. Traditional methods fall short in meeting these demands as the volume of digital evidence continues to grow. This is where hybrid cloud technology stands out, offering a powerful combination of on-premises infrastructure and cloud capabilities. Microsoft Azure, a leader in cloud solutions, brings critical features to the table, ensuring evidence remains secure, accessible, and compliant with strict legal standards. But storage alone isn't enough. Efficient file management is equally crucial for managing vast datasets while maintaining workflow efficiency. Tools like Tiger Bridge complement Microsoft Azure by bridging the gap between local and cloud storage, adding intelligence and flexibility to how evidence is preserved and accessed. Microsoft Azure Blob Storage Azure Blob Storage is massively scalable and secure object storage. For the purposes of law enforcement, among other features, it offers: Automatic Tiering: Automatically moves data between hot and cold tiers, optimizing costs, Durability: Up to sixteen 9s (99.99999999999999%) of durability ensures data integrity for decades. Metadata management: Add custom tags or blob indexes, such as police case classifications, to automate retention reviews. Microsoft Azure ensures evidence is secure, accessible, and compliant with legal standards. Tiger Bridge: Smart File Management Tiger Bridge enhances Microsoft Azure’s capabilities by seamlessly integrating local and cloud storage with powerful features tailored for forensic evidence management. Tiger Bridge is a software-only solution that integrates seamlessly with Windows servers. It handles file replication, space reclaiming, and archiving—all while preserving existing workflows and ensuring data integrity and disaster recovery. With Tiger Bridge, police departments can transition to hybrid cloud storage without adding hardware or altering processes. Data replication Tiger Bridge replicates files from on-premises storage to cloud storage, ensuring a secure backup. Replication policies run transparently in the background, allowing investigators to work uninterrupted. Files are duplicated based on user-defined criteria, such as priority cases or evidence retention timelines. Space reclamation Once files are replicated to the cloud, Tiger Bridge replaces local copies with “nearline” stubs. These stubs look like the original files but take up virtually no space. When a file is needed, it’s automatically retrieved from the cloud, reducing storage strain on local servers. Data archiving For long-term storage, Tiger Bridge moves files from hot cloud tiers to cold and / or archive storage. Files in the archive tier are replaced with "offline" stubs. These files are not immediately accessible but can be manually retrieved and rehydrated when necessary. This capability allows law enforcement agencies to save on costs while still preserving access to critical evidence. Checksum for data integrity On top of strong data integrity and data protection features already built-in in Azure Storage Blob service, Tiger Bridge goes a step further in ensuring data integrity by generating checksums for newly replicated files. These cryptographic signatures allow agencies to verify that files in the cloud are identical to the originals stored on premises. This feature is essential for forensic applications, where the authenticity of evidence must withstand courtroom scrutiny. Data integrity verification is done during uploads and retrievals, ensuring that files remain unaltered while stored in the cloud. For law enforcement, checksum validation provides peace of mind, ensuring that evidence remains admissible in court and meets strict regulatory requirements Disaster Recovery In the event of a local system failure, Tiger Bridge allows for immediate recovery. All data remains accessible in the cloud, and reinstalling Tiger Bridge on a new server re-establishes access without needing to re-download files. A real-life scenario Imagine a police department dealing with petabytes of video evidence from body cameras, surveillance footage, and digital device extractions. A simple, yet effective typical real-life scenario follows the similar patterns: Investigators collect and image evidence files, Tiger Bridge replicates this data to Azure Blob Storage, following predefined rules, Active cases remain in Azure’s hot tier, while archival data moves to cost-effective cold storage, Metadata tags in Azure help automate case retention reviews, flagging files eligible for deletion. This approach ensures evidence is accessible when needed, secure from tampering, and affordable to store long-term. The results speak for themselves. Adopting a hybrid cloud strategy delivers tangible benefits: Operational efficiency: Evidence is readily accessible without the need for extensive hardware investments and maintenance. Cost savings: Automating data tiering reduces storage costs while maintaining accessibility. Workflow continuity: Investigators can maintain existing processes with minimal disruption. Enhanced compliance: Robust security measures and chain-of-custody tracking ensure legal standards are met. A future-proof solution for digital forensics As digital evidence grows in both volume and importance, police organizations must evolve their storage strategies. Hybrid cloud solutions like Azure Blob Storage and Tiger Bridge offer a path forward: scalable, secure, and cost-effective evidence management designed for the demands of modern law enforcement. The choice is clear: Preserve the integrity of justice by adopting tools built for the future. About Tiger Technology Tiger Technology helps organizations with mission-critical deployments optimize their on-premises storage and enhance their workflows through cloud services. The company is a validated ISV partner for Microsoft in three out of five Azure Storage categories: Primary and Secondary Storage; Archive, Backup and BCDR, and Data Governance, Management, and Migration. Tiger Bridge SaaS offering on Azure Marketplace is an Azure benefit-eligible, data management software enabling seamless hybrid cloud infrastructure. Installed in the customer’s on-premises or cloud environment, Tiger Bridge intelligently connects file data across file and object storage anywhere for data lifecycle management, global file access, Disaster Recovery, data migration and access to insights. Tiger Bridge supports all Azure Blob Storage tiers, including cold and archive tiers for long-term archival of data. Read more by Tiger Technology on the Tech Community Blog: Modernization through Tiger Bridge Hybrid Cloud Data Services On-premises-first hybrid workflows in healthcare. Why start with digital pathology?211Views0likes0CommentsBuilding an AI-Powered ESG Consultant Using Azure AI Services: A Case Study
In today's corporate landscape, Environmental, Social, and Governance (ESG) compliance has become increasingly important for stakeholders. To address the challenges of analyzing vast amounts of ESG data efficiently, a comprehensive AI-powered solution called ESGai has been developed. This blog explores how Azure AI services were leveraged to create a sophisticated ESG consultant for publicly listed companies. Watch Our Video The Challenge: Making Sense of Complex ESG Data Organizations face significant challenges when analyzing ESG compliance data. Manual analysis is time-consuming, prone to errors, and difficult to scale. ESGai was designed to address these pain points by creating an AI-powered virtual consultant that provides detailed insights based on publicly available ESG data. Solution Architecture: The Three-Agent System ESGai implements a sophisticated three-agent architecture, all powered by Azure's AI capabilities: Manager Agent: Breaks down complex user queries into manageable sub-questions containing specific keywords that facilitate vector search retrieval. The system prompt includes generalized document headers from the vector database for context. Worker Agent: Processes the sub-questions generated by the Manager, connects to the vector database to retrieve relevant text chunks, and provides answers to the sub-questions. Results are stored in Cosmos DB for later use. Director Agent: Consolidates the answers from the Worker agent into a comprehensive final response tailored specifically to the user's original query. It's important to note that while conceptually there are three agents, the Worker is actually a single agent that gets called multiple times - once for each sub-question generated by the Manager. Current Implementation State The current MVP implementation has several limitations that are planned for expansion: Limited Company Coverage: The vector database currently stores data for only 2 companies, with 3 documents per company (Sustainability Report, XBRL, and BRSR). Single Model Deployment: Only one GPT-4o model is currently deployed to handle all agent functions. Basic Storage Structure: The Blob container has a simple structure with a single directory. While Azure Blob storage doesn't natively support hierarchical folders, the team plans to implement virtual folders in the future. Free Tier Limitations: Due to funding constraints, the AI Search service is using the free tier, which limits vector data storage to 50MB. Simplified Vector Database: The current index stores all 6 files (3 documents × 2 companies) in a single vector database without filtering capabilities or schema definition. Azure Services Powering ESGai The implementation of ESGai leverages multiple Azure services for a robust and scalable architecture: Azure AI Services: Provides pre-built APIs, SDKs, and services that incorporate AI capabilities without requiring extensive machine learning expertise. This includes access to 62 pre-trained models for chat completions through the AI Foundry portal. Azure OpenAI: Hosts the GPT-4o model for generating responses and the Ada embedding model for vectorization. The service combines OpenAI's advanced language models with Azure's security and enterprise features. Azure AI Foundry: Serves as an integrated platform for developing, deploying, and governing generative AI applications. It offers a centralized management centre that consolidates subscription information, connected resources, access privileges, and usage quotas. Azure AI Search (formerly Cognitive Search): Provides both full-text and vector search capabilities using the OpenAI ada-002 embedding model for vectorization. It's configured with hybrid search algorithms (BM25 RRF) for optimal chunk ranking. Azure Storage Services: Utilizes Blob Storage for storing PDFs, Business Responsibility Sustainability Reports (BRSRs), and other essential documents. It integrates seamlessly with AI Search using indexers to track database changes. Cosmos DB: Employs MongoDB APIs within Cosmos DB as a NoSQL database for storing chat history between agents and users. Azure App Services: Hosts the web application using a B3-tier plan optimized for cost efficiency, with GitHub Actions integrated for continuous deployment. Project Evolution: From Concept to Deployment The development of ESGai followed a structured approach through several phases: Phase 1: Data Cleaning Extracted specific KPIs from XML/XBRL datasets and BRSR reports containing ESG data for 1,000 listed companies Cleaned and standardized data to ensure consistency and accuracy Phase 2: RAG Framework Development Implemented Retrieval-Augmented Generation (RAG) to enhance responses by dynamically fetching relevant information Created a workflow that includes query processing, data retrieval, and response generation Phase 3: Initial Deployment Deployed models locally using Docker and n8n automation tools for testing Identified the need for more scalable web services Phase 4: Transition to Azure Services Migrated automation workflows from n8n to Azure AI Foundry services Leveraged Azure's comprehensive suite of AI services, storage solutions, and app hosting capabilities Technical Implementation Details Model Configurations: The GPT model is configured with: Model version: 2024-11-20 Temperature: 0.7 Max Response Token: 800 Past Messages: 10 Top-p: 0.95 Frequency/Presence Penalties: 0 The embedding model uses OpenAI-text-embedding-Ada-002 with 1536 dimensions and hybrid semantic search (BM25 RRF) algorithms. Cost Analysis and Efficiency A detailed cost breakdown per user query reveals: App Server: $390-400 AI Search: $5 per query RAG Query Processing: $4.76 per query Agent-specific costs: Manager: $0.05 (30 input tokens, 210 output tokens) Worker: $3.71 (1500 input tokens, 1500 output tokens) Director: $1.00 (600 input tokens, 600 output tokens) Challenges and Solutions The team faced several challenges during implementation: Quota Limitations: Initial deployments encountered token quota restrictions, which were resolved through Azure support requests (typically granted within 24 hours). Cost Optimization: High costs associated with vectorization required careful monitoring. The team addressed this by shutting down unused services and deploying on services with free tiers. Integration Issues: GitHub Actions raised errors during deployment, which were resolved using GitHub's App Service Build Service. Azure UI Complexity: The team noted that Azure AI service naming conventions were sometimes confusing, as the same name is used for both parent and child resources. Free Tier Constraints: The AI Search service's free tier limitation of 50MB for vector data storage restricts the amount of company information that can be included in the current implementation. Future Roadmap The current implementation is an MVP with several areas for expansion: Expand the database to include more publicly available sustainability reports beyond the current two companies Optimize token usage by refining query handling processes Research alternative embedding models to reduce costs while maintaining accuracy Implement a more structured storage system with virtual folders in Blob storage Upgrade from the free tier of AI Search to support larger data volumes Develop a proper schema for the vector database to enable filtering and more targeted searches Scale to multiple GPT model deployments for improved performance and redundancy Conclusion ESGai demonstrates how advanced AI techniques like Retrieval-Augmented Generation can transform data-intensive domains such as ESG consulting. By leveraging Azure's comprehensive suite of AI services alongside a robust agent-based architecture, this solution provides users with actionable insights while maintaining scalability and cost efficiency. Watch Our Video85Views0likes0CommentsHow to Automate Cross-OS File Fixes with Azure Automation and PowerShell
Build a serverless file fixer in Azure using Automation, PowerShell, Blob Storage, and Event Grid. Learn how to set up the necessary resources, configure permissions, and automatically detect and correct cross-OS file issues—such as CRLF vs LF line endings and file permission mismatches. This streamlined approach saves time and eliminates manual fixes, ensuring smoother, error-free workflows for developers working across different operating systems.160Views0likes0CommentsHybrid File Tiering Addresses Top CIO Priorities of Risk Control and Cost Optimization
Hybrid File Tiering addresses top CIO priorities of risk control and cost optimization This article describes how you can leverage Komprise Intelligent Tiering for Azure with any on-premises file storage platform and Azure Blob Storage to reduce your cost by 70% and shrink your ransomware attack surface. Note: This article has been co-authored by Komprise and Microsoft. Unstructured data plays a big role in today's IT budgets and risk factors Unstructured data, which is any data that does not fit neatly into a database or tabular format, has been growing exponentially and is now projected by analysts to be over 80% of business information. Unstructured data is commonly referred to as file data, which is the terminology used for the rest of this article. File data has caught some IT leaders by surprise because it is now consuming a significant portion of IT budgets with no sign of slowing down. File data is expensive to manage and retain because it is typically stored and protected by replication to an identical storage platform which can be very expensive at scale. We will now review how you can easily identify hot and cold data and transparently tier cold files to Azure to cut costs and shrink ransomware exposure with Komprise. Why file data is factoring into CIO priorities CIOs are prioritizing cost optimization, risk management and revenue improvement as key priorities for their data. 56% chose cost optimization as their top priority according to the 2024 Komprise State of Unstructured Data Management survey. This is because file data is often retained for decades, its growth rate is in double-digits, and it can easily be petabytes of data. Keeping a primary copy, a backup copy and a DR copy means three or more copies of the large volume of file data which becomes prohibitively expensive. On the other hand, file data has largely been untapped in terms of value, but businesses are now realizing the importance of file data to train and fine tune AI models. Smart solutions are required to balance these competing requirements. Why file data is vulnerable to ransomware attacks File data is arguably the most difficult data to protect against ransomware attacks because it is open to many different users, groups and applications. This increases risk because a single user's or group's mistake can lead to a ransomware infection. If the file is shared and accessed again, the infection can quickly spread across the network undetected. As ransomware lurks, the risk increases. For these reasons, you cannot ignore file data when creating a ransomware defense strategy. How to leverage Azure to cut the cost and inherent risk of file data retention You can cut costs and shrink the ransomware attack surface of file data using Azure even when you still require on-premises access to your files. The key is reducing the amount of file data that is actively accessed and thus exposed to ransomware attacks. Since 80% of file data is typically cold and has not been accessed in months (see Demand for cold data storage heats up | TechTarget), transparently offloading these files to immutable storage through hybrid tiering cuts both costs and risks. Hybrid tiering offloads entire files from the data storage, snapshot, backup and DR footprints while your users continue to see and access the tiered files without any change to your application processes or user behavior. Unlike storage tiering which is typically offered by the storage vendor and causes blocks of files to be controlled by the storage filesystem to be placed in Azure, hybrid tiering operates at the file level and transparently offloads the entire file to Azure while leaving behind a link that looks and behaves like the file itself. Hybrid tiering offloads cold files to Azure to cut costs and shrink the ransomware attack surface: Cut 70%+ costs: By offloading cold files and not blocks, hybrid tiering can shrink the amount of data you are storing and backing up by 80%, which cuts costs proportionately. As shown in the example below, you can cut 70% of file storage and backup costs by using hybrid tiering. Assumptions Amount of Data on NAS (TB) 1024 % Cold Data 80% Annual Data Growth Rate 30% On-Prem NAS Cost/GB/Mo $0.07 Backup Cost/GB/Mo $0.04 Azure Blob Cool Cost/GB/Mo $0.01 Komprise Intelligent Tiering for Azure/GB/Mo $0.008 On-Prem NAS On-prem NAS + Azure Intelligent Tiering Data in On-Premises NAS 1024 205 Snapshots 30% 30% Cost of On-Prem NAS Primary Site $1,064,960 $212,992 Cost of On-Prem NAS DR Site $1,064,960 $212,992 Backup Cost $460,800 $42,598 Data on Azure Blob Cool $0 819 Cost of Azure Blob Cool $0 $201,327 Cost of Komprise $100,000 Total Cost for 1PB per Year $2,590,720 $769,909 SAVINGS/PB/Yr $1,820,811 70% Shrink ransomware attack surface by 80%: Offloading cold files to immutable Azure Blob removes cold files from the active attack surface thus eliminating 80% of the storage, DR and backup costs while also providing a potential recovery path if the cold files get infected. By having Komprise tier to immutable Azure Blob with versioning, even if someone tried to infect a cold file, it would be saved as a new version – enabling recovery using an older version. Learn more about Azure Immutable Blob storage here. In addition to cost savings and improved ransomware defense, the benefits of Hybrid Cloud Tiering using Komprise and Azure are: Leverage Existing Storage Investment: You can continue to use your existing NAS storage and Komprise to tier cold files to Azure. Users and applications continue to see and access the files as if they were still on-premises. Leverage Azure Data Services: Komprise maintains file-object duality with its patented Transparent Move Technology (TMT), which means the tiered files can be viewed and accessed in Azure as objects, allowing you to use Azure Data Services natively. This enables you to leverage the full power of Azure with your enterprise file data. Works Across Heterogeneous Vendor Storage: Komprise works across all your file and object storage to analyze and transparently tier data to Azure file and object tiers. Ongoing Lifecycle Management in Azure: Komprise continues to manage data lifecycle in Azure, so as data gets colder, it can move from Azure Blob Cool to Cold to Archive tier based on policies you control. Azure and Komprise customers are already using hybrid tiering to improve their ransomware posture while reducing costs – a great example is Katten. Global law firm saves $900,000 per year and achieves resilient ransomware defense with Komprise and Azure Katten Muchin Rosenman LLP (Katten) is a full-service law firm delivering legal services across more than a dozen practice areas and sectors, including Aviation, Construction, Energy, Education, Entertainment, Healthcare and Real Estate. Like many other large law firms, Katten has been seeing an average 20% annual growth in storage for file related data, resulting in the need to add on-premises storage capacity every 12-18 months. With a focus on managing data storage costs in an environment where data is growing exponentially annually but cannot be deleted, Katten needed a solution that could provide deep data insights and the ability to move file data as it ages to immutable object storage in the cloud for greater cost savings and ransomware protection. Katten Law implemented hybrid tiering using Komprise Intelligent Tiering to Azure and leveraged Immutable Blob storage to not only save $900,000 annually but also improved their ransomware defense posture. Read how Katten Law does hybrid tiering to Azure using Komprise. Summary: Hybrid Tiering helps CIOs to optimize file costs and cut ransomware risks Cost optimization and Risk management are top CIO priorities. File data is a major contributor to both costs and ransomware risks. Organizations are leveraging Komprise to tier cold files to Azure while continuing to use their on-premises file storage NAS. This provides a low risk approach with no disruption to users and apps while cutting 70% costs and shrinking the ransomware attack surface by 80%. Next steps To learn more and get a customized assessment of your savings, visit the Azure Marketplace listing or contact azure@komprise.com.627Views3likes1CommentMicrosoft Purview Protection Policies for Azure Data Lake & Blob Storage Available in All Regions
Organizations today face a critical challenge: ensuring consistent and automated data governance across rapidly expanding data estates. Driven by the growth of AI and the increasing reliance on vast data volumes for model training, Chief Data Officers (CDOs) and Chief Information Security Officers (CISOs) must prevent unintentional exposure of sensitive data (PII, credit card information) while adhering to data and legal regulations. Many organizations rely on Azure Blob Storage and ADLS for storing vast amounts of data, offering scalable, secure, and highly available cloud storage solutions. While solutions like RBAC (role-based access control), ABAC (attribute-based access control), and ACLs (Access Control Lists) offer secure ways to manage data access, they can operate on metadata such as file paths, tags, or container names. These mechanisms are effective for implementing restrictive data governance by controlling who can access specific files or containers. However, there are scenarios were implementing automatic access controls based on the sensitivity of the content itself is necessary. For example, identifying and protecting sensitive information like credit card numbers within a blob requires more granular control. Ensuring that sensitive content is restricted to specific roles and applications across the organization is crucial, especially as enterprises focus on building new applications and infusing AI into current solutions. This is where integrated solutions like Microsoft Information Protection (MIP) come into play. Microsoft Information Protection (MIP) protection policies provide a solution by enabling organizations to scan and label data based on the content stored in the blob. This allows for applying access controls directly related to the data asset content across storage accounts. By eliminating the need for in-house scanning and labeling, MIP streamlines compliance and helps in applying consistent data governance using a centralized solution. The Solution: Microsoft Purview Information Protection (MIP) Protection Policies for Governance & Compliance Microsoft Purview Information Protection (MIP) provides an efficient and centralized approach to data protection by automatically restricting access to storage data assets based on sensitivity labels discovered through automated scanning and leveraging Protection policies (learn more). This feature builds upon Microsoft Purview's existing capability (learn more) to scan and label sensitive data assets, ensuring robust data protection. This not only enhances data governance but also ensures that data is managed in a way that protects sensitive information, reducing the risk of unauthorized access and maintaining the security and trust of customers. Enhancing Data Governance with MIP Protection policies: Contoso, a multinational corporation, handles large volumes of data stored in Azure Storage (Blob/ADLS). Different users, such as financial auditors, legal advisors, compliance officers, and data analysts, need access to different blobs in the Storage account. These blobs are updated daily with new content, and there can be sensitive data across these blobs. Given the diverse nature of the stored data, Contoso needed an access control method that could restrict access based on data asset sensitivity. For instance, data analysts access the blob named "logs" where log files are uploaded. If these files contain PII or financial data, which should only be accessed by financial officers, the access permissions need to be dynamically updated based on the changing sensitivity of the stored data. MIP protection policies can address this challenge efficiently by automatically limiting access to data based on sensitivity labels found through automated scanning. Key Benefits: Auto-labelling: Automatically apply sensitivity labels to Azure Storage based on detection of sensitive information types. Automated Protection: Automatically restrict access to data with specific sensitivity labels, ensuring consistent data protection. Storage Data Owners can selectively enable specific storage accounts for policy enforcement, providing flexibility and control. Like a protection policy that restricted access to data labeled as "Highly Confidential" to only specific groups or users. For instance, blobs labeled with "logs" were accessible only to data analysts. With MIP, the labels are updated based on content changes, and the protection policy can deny access if the content if any “Highly Confidential” data is identified. Enterprise-level Control: Information Protection policies are applied to blobs and resource sets, ensuring that only authorized Azure Entra ID users or M365 user groups can access sensitive data. Unauthorized users will be prevented from reading the blob or resource set. Centralized Policy Management: Create, manage, and enforce protection policies across Azure Storage from a single, unified interface in Microsoft Purview. Enterprise admins have granular control over which storage accounts enforce protection coverage based on the account’s sensitivity label. By using Microsoft Purview Information Protection (MIP) Protection Policies, Contoso was able to achieve secure and consistent data governance, and centralized policy management, effectively addressing their data security challenges Prerequisites Microsoft 365 E5 licenses and setup of pay as you go billing model. To understand pay as you go billing by assets protected, see the pay-as-you-go billing model. For information about the specific licenses required, see this information on sensitivity labels. Microsoft 365 E5 trial licenses can be attained for your tenant by navigating here from your environment. Getting Started The public preview of Protection Policies supports the following Azure Storage services: Azure Blob Storage Azure Data Lake Storage To enable Protection Policies for your Azure Storage accounts: Navigate to the Microsoft Purview portal> Information Protection card > Policies. Configure or use an existing sensitivity label in Microsoft Purview Information Protection that’s scoped to “Files & other data assets” Create an auto-labelling to apply a specific sensitivity label to scoped assets in Azure Storage based on Microsoft out-of-the-box sensitive info types detected. Run scans on assets for auto-labelling to apply. Create a protection policy and associate it with your desired sensitivity labels. Apply the policy to your Azure Blob Storage or ADLS Gen2 accounts. Limitations During the public preview, please note the following limitations: Currently a maximum of 10 storage accounts are supported in one protection policy, and they must be selected under Edit for them to be enabled. Changing pattern rules will re-apply labels on all storage accounts. During the public preview, there might be delays in label synchronization, which could prevent MIP policies from functioning effectively. If customer storage account enables CMK, the storage account MIP policy will not work. Next Steps With the Public Preview, MIP Protection policies is now available in all regions, and any storage account registered on the Microsoft Purview Data Map can create and apply protection policies to implement consistent data governance strategies across their data in Azure Storage. We encourage you to try out this feature and provide feedback. Your input is crucial in shaping this feature as we work towards general availability.1.6KViews0likes0CommentsHow to Save 70% on File Data Costs
In the final entry in our series on lowering file storage costs, DarrenKomprise shares how Komprise can help lower on-premises and Azure-based file storage costs. Komprise and Azure offer you a means to optimize unstructured data costs now and in the future!14KViews1like1CommentControl geo failover for ADLS and SFTP with unplanned failover.
We are excited to announce the General Availability of customer managed unplanned failover for Azure Data Lake Storage and storage accounts with SSH File Transfer Protocol (SFTP) enabled. What is Unplanned Failover? With customer managed unplanned failover, you are in control of initiating your failover. Unplanned failover allows you to switch your storage endpoints from the primary region to the secondary region. During an unplanned failover, write requests are redirected to the secondary region, which then becomes the new primary region. Because an unplanned failover is designed for scenarios where the primary region is experiencing an availability issue, unplanned failover happens without the primary region fully completing replication to the secondary region. As a result, during an unplanned failover there is a possibility of data loss. This loss depends on the amount of data that has yet to be replicated from the primary region to the secondary region. Each storage account has a ‘last sync time’ property, which indicates the last time a full synchronization between the primary and the secondary region was completed. Any data written between the last sync time and the current time may only be partially replicated to the secondary region, which is why unplanned failover may incur data loss. Unplanned failover is intended to be utilized during a true disaster where the primary region is unavailable. Therefore, once completed, the data in the original primary region is erased, the account is changed to locally redundant storage (LRS) and your applications can resume writing data to the storage account. If the previous primary region becomes available again, you can convert your account back to geo-redundant storage (GRS). Migrating your account from LRS to GRS will initiate a full data replication from the new primary region to the secondary which has geo-bandwidth costs. If your scenario involves failing over while the primary region is still available, consider planned failover. Planned failover can be utilized in scenarios including planned disaster recovery testing or recovering from non-storage related outages. Unlike unplanned failover, the storage service endpoints must be available in both the primary and secondary regions before a planned failover can be initiated. This is because planned failover is a 3-step process that includes: (1) making the current primary read only, (2) syncing all the data to the secondary (ensuring no data loss), and (3) swapping the primary and secondary regions so that writes are now in the new region. In contrast with unplanned failover, planned failover maintains the geo-redundancy of the account so planned failback does not require a full data copy. To learn more about planned failover and how it works view, Public Preview: Customer Managed Planned Failover for Azure Storage | Microsoft Community Hub To learn more about each failover option and the primary use case for each view, Azure storage disaster recovery planning and failover - Azure Storage | Microsoft Learn How to get started? Getting started is simple, to learn more about the step-by-step process to initiate an unplanned failover review the documentation: Initiate a storage account failover - Azure Storage | Microsoft Learn Feedback If you have questions or feedback, reach out at storagefailover@service.microsoft.com339Views0likes0Comments