azure blob
53 TopicsBeyond Basics: Practical scenarios with Azure Storage Actions
If you are new to Azure Storage Actions, check out our GA announcement blog for an introduction. This post is for cloud architects, data engineers, and IT admins who want to automate and optimize data governance at scale. The Challenge: Modern Data Management at Scale As organizations generate more data than ever, managing that data efficiently and securely is a growing challenge. Manual scripts, periodic audits, and ad-hoc cleanups can’t keep up with the scale, complexity, and compliance demands of today’s cloud workloads. Teams need automation that’s reliable, scalable, and easy to maintain. Azure Storage Actions delivers on this need by enabling policy-driven automation for your storage accounts. With Storage Actions, you can: Automate compliance (e.g., legal holds, retention) Optimize storage costs (e.g., auto-tiering, expiry) Reduce operational overhead (no more custom cleanup scripts) Improve data discoverability (tagging, labeling) Real-World Scenarios: Unlocking the Power of Storage Actions Let’s explore 3 practical scenarios where Storage Actions can transform customers’ data management approach. For each, we’ll look at the business problem, the traditional approach, and how Storage Actions makes it easier with the exact conditions and operations which can be used. Scenario 1: Content Lifecycle for Brand Teams Business Problem: Brand and marketing teams manage large volumes of creative assets - videos, design files, campaign materials that evolve through multiple stages and often carry licensing restrictions. These assets need to be retained, frozen, or archived based on their lifecycle and usage rights. Traditionally, teams rely on scripts or manual workflows to manage this, which can be error-prone, slow, and difficult to scale. How Storage Actions Helps: Azure Storage Actions enables brand teams to automate the content lifecycle management using blob metadata and / or index tag. With a single task definition using an IF and ELSE structure, teams can apply different operations to blobs based on their stage, licensing status, and age without writing or maintaining scripts. Example in Practice: Let’s say a brand team manages thousands of creative assets videos, design files, campaign materials each tagged with blob metadata that reflects its lifecycle stage and licensing status. For instance: Assets that are ready for public use are tagged with asset-stage = final Licensed or restricted-use content is tagged with usage-rights = restricted Over time, these assets accumulate in your storage account, and you need a way to: Ensure that licensed content is protected from accidental deletion or modification Archive older final assets to reduce storage costs Apply these rules automatically, without relying on scripts or manual reviews With Azure Storage Actions, the team can define a single task that evaluates each blob and applies the appropriate operation using a simple IF and ELSE structure: IF: - Metadata.Value["asset-stage"] equals "final" - AND Metadata.Value["usage-rights"] equals “restricted” - AND creationTime < 60d THEN: - SetBlobLegalHold: This locks the blob to prevent deletion or modification, ensuring compliance with licensing agreements. - SetBlobTier to Archive: This moves the blob to the Archive tier, significantly reducing storage costs for older content that is rarely accessed. ELSE - SetBlobTier to Cool: If the blob does not meet the above criteria whether it’s a draft, unlicensed, or recently created, it is moved to the Cool tier. Once this Storage Action is created and assigned to a storage account, it is scheduled to run automatically every week. During each scheduled run, the task evaluates every blob in the target container or account. For each blob, it checks if the asset is marked as final, tagged with usage-rights, and older than 60 days. If all these conditions are met, the blob is locked with a legal hold to prevent accidental deletion and then archived to optimize storage costs. If the blob does not meet all of these criteria, it is moved to the Cool tier, ensuring it remains accessible but stored more economically. This weekly automation ensures that every asset is managed appropriately based on its metadata, without requiring manual intervention or custom scripts. Scenario 2: Audit-Proof Model Training Business Problem: In machine learning workflows, ensuring the integrity and reproducibility of training data is critical especially when models influence regulated decisions in sectors like automotive, finance, healthcare, or legal compliance. Months or even years after a model is deployed, auditors or regulators may request proof that the training data used has not been altered since the model was built. Traditionally, teams try to preserve training datasets by duplicating them into backup storage, applying naming conventions, and manually restricting access. These methods are error-prone, hard to enforce at scale, and lack auditability. How Storage Actions Helps: Storage Actions enables teams to automate the preservation of validated training datasets using blob tags and immutability policies. Once a dataset is marked as clean and ready for training, Storage Actions can automatically: Lock the dataset using a time-based immutability policy Apply a tag to indicate it is a snapshot version This ensures that the dataset cannot be modified or deleted for the duration of the lock, and it is easily discoverable for future audits. Example in Practice: Let’s say an ML data pipeline tags a dataset with stage=clean after it passes validation and is ready for training. Storage Actions detects this tag and springs into action. It enforces a 1-year immutability policy, which means the dataset is locked and cannot be modified or deleted for the next 12 months. It also applies a tag snapshot=true, making it easy to locate and reference in future audits or investigations. The following conditions and operations define the task logic: IF: - Tags.Value[stage] equals 'clean' THEN: - SetBlobImmutabilityPolicy for 1-year: This adds a write once, read many (WORM) immutability policy on the blob to prevent deletion or modification, ensuring compliance. - SetBlobTags with snapshot=true: This adds a blob index tag with name “snapshot” and value “true”. Whenever this task runs on its scheduled interval - such as daily or weekly, it detects if a blob has the tag stage = 'clean', it automatically initiates the configured operations. In this case, Storage Actions applies a SetBlobImmutabilityPolicy on the blob for one year and adds a snapshot=true tag for easy identification. This means that without any manual intervention: The blob is made immutable for 12 months, preventing any modifications or deletions during that period. A snapshot=true tag is applied, making it easy to locate and audit later. No scripts, manual tagging, or access restrictions are needed to enforce data integrity. This ensures that validated training datasets are preserved in a tamper-proof state, satisfying audit and compliance requirements. It also reduces operational overhead by automating what would otherwise be a complex and error-prone manual process. Scenario 3: Embedding Management in AI Workflows Business Problem: Modern AI systems, especially those using Retrieval-Augmented Generation (RAG), rely heavily on vector embeddings to represent and retrieve relevant context from large document stores. These embeddings are often generated in real time, chunked into small files, and stored in vector databases or blob storage. As usage scales, these systems generate millions of small embedding files, many of which become obsolete quickly due to frequent updates, re-indexing, or model version changes. This silent accumulation of stale embeddings leads to: Increased storage costs Slower retrieval performance Operational complexity in managing the timings Traditionally, teams write scripts to purge old embeddings based on timestamps, run scheduled jobs, and manually monitor usage. This approach is brittle and does not scale well. How Storage Actions Helps: Storage Actions enables customers to automate the management of embeddings using blob tags and metadata. With blobs being identified with tags and metadata such as embeddings=true, modelVersion=latest, customers can define conditions that automatically delete stale embeddings without writing custom scripts. Example in Practice: In production RAG systems, embeddings are frequently regenerated to reflect updated content, new model versions, or refined chunking strategies. For example, a customer support chatbot may re-index its knowledge base daily to ensure responses are grounded in the latest documentation. To avoid bloating storage with outdated vector embeddings, Storage Actions can automate cleanup with task conditions and operation such as: IF: - Tags.Value[embeddings] equals 'true' - AND NOT Tags.Value[version] equals ‘latest’ - AND creation time < 12 days ago THEN: - DeleteBlob: This deletes all blobs which match the IF condition criteria. Whenever this Storage Action runs on its scheduled interval - such as daily - it scans for blobs that have the tag embeddings = ‘true’ and is not the latest version with its age being more than 12 days old, it automatically initiates the configured operation. In this case, Storage Actions does a DeleteBlob operation on the blob. This means that without any manual intervention: The stale embeddings are deleted No scripts or scheduled jobs are needed to track. This ensures that only the most recent model’s embeddings are retained, keeping the vector store lean and performant. It also reduces storage costs by eliminating obsolete data and helps maintain retrieval accuracy by ensuring outdated embeddings do not interfere with current queries. Applying Storage Actions to Storage Accounts To apply any of the scenarios, customers create an assignment during the storage task resource creation. In the assignment creation flow, they select the appropriate role and configure filters and trigger details. For example, a compliance cleanup scenario might run across the entire storage account with a recurring schedule every seven days to remove non-compliant blobs. A cost optimization scenario could target a specific container using a blob prefix and run as a one-time task to archive older blobs. A bulk tag update scenario would typically apply to all blobs without filtering and use a recurring schedule to keep tags consistent. After setting start and end dates, specifying the export container, and enabling the task, clicking Add queues the action to run on the account. Learn More If you are interested in exploring Storage Actions further, there are several resources to help you get started and deepen your understanding: Documentation on Getting Started: https://learn.microsoft.com/en-us/azure/storage-actions/storage-tasks/storage-task-quickstart-portal Create a Storage Action from the Azure Portal: https://portal.azure.com/#create/Microsoft.StorageTask Azure Storage Actions pricing: https://azure.microsoft.com/en-us/pricing/details/storage-actions/#pricing Azure Blog about the GA announcement: https://azure.microsoft.com/en-us/blog/unlock-seamless-data-management-with-azure-storage-actions-now-generally-available/ Azure Skilling Video with a walkthrough of Storage Actions: https://www.youtube.com/watch?v=CNdMFhdiNo8 Have questions, feedback, or a scenario to share? Drop a comment below or reach out to us at storageactions@microsoft.com. We would love to hear how you are using Storage Actions and what scenarios you would like to see next!274Views1like0CommentsIntroducing Cross Resource Metrics and Alerts Support for Azure Storage
Aggregate and analyze storage metrics from multiple storage accounts in a single chart. We’re thrilled to announce a highly requested feature: Cross Resource Metrics and Alerts support for Azure Storage! With this new capability, you can now monitor and visualize metrics across multiple storage accounts in a single chart and configure alerts across multiple accounts — within the same subscription and region. This makes managing large fleets of storage accounts significantly easier and more powerful. What’s New Cross Resource Metrics Support Visualize aggregated metric data across multiple storage accounts. Break down metrics by individual resources in a sorted and ordered way. Cross Resource Alerting Support Create a single alert rule that monitors a metric across many storage accounts and triggers an action when thresholds are crossed on any resource. Full Metric Namespace Support Works across Blob, File, Table, and Queue metric namespaces. All existing storage metrics are supported for cross resource visualization and alerting. Why This Matters Centralized Monitoring for Large Environments Manage and monitor dozens (or hundreds) of storage accounts at once with a unified view. Fleet-wide Alerting Set up a single alert that covers your whole storage fleet, ensuring you are quickly notified if any account experiences performance degradation or other issues. Operational Efficiency Helps operations teams scale monitoring efforts without needing to configure and manage separate dashboards and alerts for each account individually. How To Get Started Step 1: Create a Cross Resource Metrics Chart Go to Azure Monitor -> Metrics. Scope Selection: Under Select a scope, select the same Metric Namespace (blob/file/queue/table) for multiple Storage Accounts from the same Subscription and Region. Click Apply. In the below example, two storage accounts have been selected for metrics in the blob metric namespace. Configure Metric Chart: Select a Metric (e.g., Blob Capacity, Transactions, Ingress) Aggregation: By default, a Split by clause on ResourceId is applied to view individual account breakdowns. Or view aggregated data across all selected accounts by removing the Split by clause. Example As another example, lets monitor total transactions across storage accounts on the Hot tier to view aggregate or per-account breakdown in a single graph. From the same view, select the Transactions metric instead. Select 5 storage accounts by using the Add Filter clause and filtering by the ResourceId property. Add another filter and select a specific tier, say Hot. This will show aggregated transactions on data in the Hot tier per minute across all selected storage accounts. Select Apply Splitting and select ResourceId to view an ordered breakdown of transactions per minute for all the Storage accounts in scope. In this specific example, only 4 storage accounts are shown since 1 storage account is excluded based on the Tier filter. Step 2: Set Up Cross Resource Alert Rules Click on New alert rule from the chart view shown above in order to create an alert that spans the 5 storage accounts above and get alerted when any account breaches a certain transactions limit over a 5 minute period. Configure required values for the Threshold, Unit and Value is fields. This defines when the alert should fire (e.g., Transactions > 5000) Under the Split by dimensions section, ensure that the Microsoft.ResourceId dimension is not included. Under Actions, attach an Action Group (Email, Webhook, Logic App, etc.). Review and Create. Final Thoughts Cross Resource Metrics and Alerts for Azure Storage makes monitoring and management at scale much more intuitive and efficient. Whether you're overseeing 5 storage accounts or 500, you can now visualize performance and respond to issues faster than ever. And you can do it for metrics across multiple storage services including blobs, queues, files and tables! We can't wait to hear how you use this feature! Let us know your feedback by commenting below or visiting Azure Feedback.245Views2likes0CommentsDifferences between PowerShell and Browser when upload file
Hi All, Anybody have noticed similar behavior? When uploading the file into the storage account that is working find. But if on the same workstation you try to do this using the PowerShell command: Set-AzStorageBlobContent the if fails to: ErrorCode: AuthorizationPermissionMismatch Here is also the longer trace: $sa = get-azstorageAccount -ResourceGroupName RG01 -Name storage01 $strCTX = New-AzStorageContext -StorageAccountName $sa.StorageAccountName $strCTX | Set-AzStorageBlobContent -File C:\temp\test.txt -Container delate -Blob test.txt -Verbose VERBOSE: Performing the operation "Set" on target "test.txt". Set-AzStorageBlobContent: This request is not authorized to perform this operation using this permission. HTTP Status Code: 403 - HTTP Error Message: This request is not authorized to perform this operation using this permission. ErrorCode: AuthorizationPermissionMismatch ErrorMessage: This request is not authorized to perform this operation using this permission. RequestId: 3150eeb6-761e-0096-2edd-56e8bc000000 Time: Tue, 30 Sep 2025 10:25:51 GMT VERBOSE: Transfer Summary -------------------------------- Total: 1. Successful: 0. Failed: 1. Some thing which makes this a bit more odd, is, when I'm looking for the roles and their data accesses, they both looks like following: So I'm not even sure how I do have access to that SA.36Views0likes1CommentProtect your Storage accounts using network security perimeter - now generally available
We are excited to announce the general availability of network security perimeter support for Azure Storage accounts. A network security perimeter allows organizations to define a logical network isolation boundary for Platform-as-a-Service (PaaS) resources like Azure Storage accounts that are deployed outside your organization’s virtual networks. This restricts public network access to PaaS resources by default and provides secure communication between resources within the perimeter. Explicit inbound and outbound rules allow access to authorized resources. Securing data within storage accounts requires a multi-layered approach, encompassing network access controls, authentication, and authorization mechanisms. Network access controls for storage accounts can be defined into two broad categories - access from PaaS resources and from all other resources. For access from PaaS resources, organizations can leverage either broad controls through Azure “trusted services” or granular access using resource instance rul es. For other resources, access control may involve IP-based firewall rules, allowing virtual network access, or by enabling private endpoints. However, the complexity of managing all these can pose significant challenges when scaled across large enterprises. Misconfigured firewalls, public network exposure on storage accounts, or excessively permissive policies heighten the risk of data exfiltration. It is often challenging to audit these risks at the application or storage account level, making it difficult to identify open exfiltration paths throughout all PaaS resources in an environment. Network security perimeters offer an effective solution to these concerns. First, by grouping assets such as Azure Key Vault and Azure Monitor in the same perimeter as your storage accounts, communications between these resources are secured while disabling public access by default, thereby preventing data exfiltration to unauthorized destinations. Then, they centralize the management of network access controls across numerous PaaS resources at scale by providing a single pane of glass. This approach promotes consistency in settings and reduces administrative overhead, thereby minimizing potential for configuration errors. Additionally, they provide comprehensive control over both inbound and outbound access across all the associated PaaS resources to authorized resources. How do network security perimeters protect Azure Storage Accounts? Network security perimeters support granular resource access using profiles. All inbound and outbound rules are defined on a profile, and the profile can be applied to single or multiple resources within the perimeter. Network security perimeters provide two primary operating modes: “Transition” mode (formerly referred to as “Learning” mode) and “Enforced” mode. “Transition” mode acts as an initial phase when onboarding a PaaS resource into any network security perimeter. When combined with logging, this mode enables you to analyze current access patterns without disrupting existing connectivity. “Enforced” mode is when are all defined perimeter rules replace all resource specific rules except private end points. After analyzing logs in “Transition” mode, you can tweak your perimeter rules as necessary and then switch to “Enforced” mode. Benefits of network security perimeters Secure resource-to-resource communication: Resources in the same perimeter communicate securely, keeping data internal and blocking unauthorized transfers. For example, an application’s storage account and its associated database, when part of the same perimeter can communicate securely. However, all communications from another database outside of the perimeter will be blocked to the storage account. Centralized network isolation Administrators can manage firewall and resource access policies centrally in network security perimeters across all their PaaS resources in a single pane of glass, streamlining operations and minimizing errors. Prevent data exfiltration: Centralized access control and logging of inbound and outbound network access attempts across all resources within a perimeter enables comprehensive visibility for compliance and auditing purposes and helps address data exfiltration. Seamless integration with existing Azure features: Network security perimeter works in conjunction with private endpoints by allowing Private endpoint traffic to storage accounts within a perimeter There is no additional cost to using network security perimeter. Real-world customer scenarios Let us explore how network security perimeters specifically strengthen the security and management of Azure Storage accounts through common applications. Create a Secure Boundary for Storage Accounts A leading financial organization sought to enhance the protection of sensitive client data stored in Azure Storage accounts. The company used Azure Monitor with a Log Analytics workspace to collect and centralize logs from all storage accounts, enabling constant monitoring and alerts for suspicious activity. This supported compliance and rapid incident response. They also used Azure Key Vault to access customer-managed encryption keys. They configured network access controls on each communication path from these resources to the storage account. They disabled public network access and employed a combination of Virtual Network (Vnet), firewall rules, private endpoints, and service endpoints. However, this created a huge overhead that had to be continuously managed as and when additional resources required access to the storage account. To address this, the company implemented network security perimeters, and blocked public and untrusted access to their storage account by default. By placing the specific Azure Key Vault and Log Analytics Workspace within the same network security perimeter as the storage account, the organization achieved a secure boundary around their data in an efficient manner. Additionally, to let an authorized application access this data, they defined an inbound access rule in the profile governing their storage account, thereby restricting access for the application to only the required PaaS resources. Prevent Data Exfiltration from Storage Accounts One of the most dangerous data exfiltration attacks is when an attacker obtains the credentials to a user account with access to an Azure Storage account, perhaps through phishing or credential stuffing. In a traditional setup, this attacker could potentially connect from anywhere on the internet and initiate large-scale data exfiltration to external servers, putting sensitive business or customer information at risk. With network security perimeter in place, however, only resources within perimeter or authorized external resources can access the storage account, drastically limiting the attacker’s options. Even if they have valid credentials, network security perimeter rules block the attacker’s attempts to connect from an unapproved network or unapproved machines within a compromised network. Furthermore, the perimeter enforces strict outbound traffic controls: storage accounts inside the perimeter cannot send data to any external endpoint unless a specific outbound rule permits it. Restricting inbound access and tightly controlling outbound data flows enhances the security of sensitive data in Azure Storage accounts. The presence of robust network access control on top of storage account credentials creates multiple hurdles for attackers to overcome and significantly reduces the risk of both unauthorized access and data exfiltration. Unified Access Management across the entire Storage estate A Large retailer found it difficult to manage multiple Azure Storage accounts. Typically, updating firewall rules or access permissions involved making repeated changes for each account or using complex scripts to automate the process. This approach not only increased the workload but also raised the risk of inconsistent settings or misconfigurations, which could potentially expose data. With network security perimeter, the retailed grouped storage accounts under a perimeter and sometimes using subsets of accounts under different perimeters. For accounts requiring special permissions within a single perimeter, the organization created separate profiles to customize inbound and outbound rules specific to them. Administrators could now define and update access policies at the profile level, with rules immediately enforced across every storage account and other resources associated with the profile. The updates consistently applied to all resources for both blocking public internet access and for allowing specific internal subscriptions, thus reducing gaps and simplifying operations. The network security perimeter also provided a centralized log of all network access attempts on storage accounts, eliminating the need for security teams to pull logs separately from each account. It showed what calls accessed accounts, when, and where, starting immediately after enabling logs in the “Transition” mode, and then continuing into Enforced mode. This streamlined approach enhances the organization’s compliance reporting, accelerated incident response, and improved understanding of information flow across the cloud storage environment. Getting started Explore this Quickstart guide, to implement a network security perimeter and configure the right profiles for your storage accounts. For guidance on usage and limitations related to Storage accounts, refer to the documentation. Network security perimeter does not have additional costs for using it. As you begin, consider which storage accounts to group under a perimeter, and how to segment profiles for special access needs within the perimeter.721Views2likes0CommentsHow Microsoft Azure and Qumulo Deliver a Truly Cloud-Native File System for the Enterprise
Disclaimer: The following is a post authored by our partner Qumulo. Qumulo has been a valued partner in the Azure Storage ecosystem for many years and we are happy to share details on their unique approach to solving challenges of scalable filesystems! Whether you’re training massive AI models, running HPC simulations in life sciences, or managing unstructured media archives at scale, performance is everything. Qumulo and Microsoft Azure deliver the cloud-native file system built to handle the most data-intensive workloads, with the speed, scalability, and simplicity today's innovators demand. But supporting modern workloads at scale is only part of the equation. Qumulo and Microsoft have resolved one of the most entrenched and difficult challenges in modernizing the enterprise data estate: empowering file data with high performance across a global workforce without impacting the economics of unstructured data storage. According to Gartner, global end-user spending on public cloud services is set to surpass $1 trillion by 2027. That staggering figure reflects more than just a shift in IT budgets—it signals a high-stakes race for relevance. CIOs, CTOs, and other tech-savvy execs are under relentless pressure to deliver the capabilities that keep businesses profitable and competitive. Whether they’re ready or not, the mandate is clear: modernize fast enough to keep up with disruptors, many of whom are using AI and lean teams to move at lightning speed. To put it simply, grow margins without getting outpaced by a two-person startup using AI in a garage. That’s the challenge leaders face every day. Established enterprises must contend with the duality of maintaining successful existing operations and the potential disruption to those operations by a more agile business model that offers insight into the next wave of customer desires and needs. Nevertheless, established enterprises have a winning move - unleash the latent productivity increases and decision-making power hidden within years, if not decades, worth of data. Thoughtful CIOs, CTOs, and CXOs have elected to move slowly in these areas due to the tyranny of quarterly results and the risk of short-term costs reflecting poorly on the present at the expense of the future. In this sense, adopting innovative technologies forced organizations to choose between self-disruption with long-term benefits or non-disruptive technologies with long-term disruption risk. When it comes to network-attached storage, CXOs were forced to accept non-disruptive technologies because the risk was too high. This trade-off is no longer required. Microsoft and Qumulo have addressed this challenge in the realm of unstructured file data technologies by delivering a cloud-native architecture that combines proven Azure primitives with Qumulo’s suite of file storage solutions. Now, those patient CXOs, waiting to adopt hardened technologies, can shift their file data paradigm into Azure while improving business value, data portability, and reducing the financial burden on their business units. Today, organizations that range from 50,000+ employees with global offices, to organizations with a few dozen employees with unstructured data-centric operations have discovered the incredible performance increases, data availability, accessibility, and economic savings realized when file data moves into Azure using one of two Qumulo solutions: Option 1 — Azure Native Qumulo (ANQ) is a fully managed file service that delivers truly elastic capacity, throughput, and IOPS, along with all the enterprise features of your on-premises NAS and a TCO to match. Option 2 — Cloud Native Qumulo (CNQ) on Microsoft Azure is a self-hosted file data service that offers the performance and scale your most demanding workloads require, at a comparable total cost of ownership to on-premises storage. Both CNQ on Microsoft Azure and ANQ offer the flexibility and capacity of object storage while remaining fully compatible with file-based workflows. As data platforms purpose-built for the cloud, CNQ and ANQ provide three key characteristics: Elasticity — Performance and capacity can scale independently, both up and down, dynamically. Boundless Scale — Virtually no limitations on file system size or file count, with full multi-protocol support. Utility-Based Pricing — Like Microsoft Azure, Qumulo operates on a pay-as-you-go model, charging only for resources used without requiring pre-provisioned capacity or performance. The collaboration between Qumulo’s cloud-native file solutions and the Microsoft Azure ecosystem enables seamless migration of a wide range of workflows, from large-scale archives to high-performance computing (HPC) applications, from on-premises environments to the cloud. For example, a healthcare organization running a fully cloud-hosted Picture Archiving and Communication System (PACS) alongside a Vendor Neutral Archive (VNA) can leverage Cloud Native Qumulo (CNQ) to manage medical imaging data in Azure. CNQ offers a HIPAA-compliant, highly durable, and cost-efficient platform for storing both active and infrequently accessed diagnostic images, enabling secure access while optimizing storage costs. With Azure’s robust cloud infrastructure, organizations can design a cloud file solution that scales to meet virtually any size or performance requirement, while unlocking new possibilities in cloud-based AI and HPC workloads. Further, using the Qumulo Cloud Data Fabric, the enterprise is able to connect geographically separated data sources within one unified, strictly consistent (POSIX-compliant), secure, and high-performance file system. As organizational needs evolve — whether new workloads are added or existing workloads expand — Cloud Native Qumulo or Azure Native Qumulo can easily scale to meet performance demands while maintaining the predictable economics that meet existing or shrinking budgets. About Azure Native Qumulo and Cloud Native Qumulo on Azure Azure Native Qumulo (ANQ) and Cloud Native Qumulo (CNQ) enable organizations to leverage a fully customizable, multi-protocol solution that dynamically scales to meet workload performance requirements. Engineered specifically for the cloud, ANQ is designed for simplicity of operation and automatic scalability as a fully managed service. CNQ offers the same great technology, directly leveraging cloud-native resources like Azure Virtual Machines (VMs), Azure Networking, and Azure Blob Storage to provide a scalable platform that adapts to the evolving needs of today’s workloads – but deploys entirely in the enterprise tenant, allows for direct control over the underlying infrastructure, and requires a little bit higher level of internal expertise to operate. Azure Native Qumulo and Cloud Native Qumulo on Azure also deliver a fully dynamic file storage platform that is natively integrated with the Microsoft Azure backend. Here’s what sets ANQ and CNQ apart: Elastic Scalability — Each ANQ and CNQ instance on Azure Blob Storage can automatically scale to exabyte-level storage within a single namespace by simply adding data. On Microsoft Azure, performance adjustments are straightforward: just add or remove compute instances to instantly boost throughput or IOPS, all without disruption and within minutes. Plus, you pay only for the capacity and compute resources you use. Deployed in Minutes — ANQ deploys from the Azure Portal, CLI, or PowerShell, just like a native service. CNQ runs in your own Azure virtual network and can be deployed via Terraform. You can select the compute type that best matches your workload’s performance requirements and build a complete file data platform on Azure in under six minutes for a three-node cluster. Automatic TCO Management — can be facilitated through services like Komprise Intelligent Tiering for Azure and Azure Blob Storage access tiers. It optimizes storage costs and manages data lifecycle. By analyzing data access patterns, these systems move files or objects to appropriate tiers, reducing costs for infrequently accessed data. Additionally, all data written to CNQ is compressed to ensure maximum cost efficiency. ANQ automatically adapts to your workload requirements, and CNQ’s fully customizable architecture can be configured to meet the specific throughput and IOPS requirements of virtually any file or object-based workload. You can purchase either ANQ or CNQ through a pay-as-you-go model, eliminating the need to pre-provision cloud file services. Simply pay for what you use. ANQ and CNQ deliver comparable performance and services to on-premises file storage at a similar TCO. Qumulo’s cloud-native architecture redefines cloud storage by decoupling capacity from performance, allowing both to be adjusted independently and on demand. This provides the flexibility to modify components such as compute instance type, compute instance count, and cache disk capacity — enabling rapid, non-disruptive performance adjustments. This architecture, which includes the innovative Predictive Cache, delivers exceptional elasticity and virtually unlimited capacity. It ensures that businesses can efficiently manage and scale their data storage as their needs evolve, without compromising performance or reliability. ANQ and CNQ retain all the core Qumulo functionalities — including real-time analytics, robust data protection, security, and global collaboration. Example architecture In the example architecture, we see a solution that uses Komprise to migrate file data from third-party NAS systems to ANQ. Komprise provides platform-agnostic file migration services at massive scale in heterogeneous NAS environments. This solution facilitates the seamless migration of file data between mixed storage platforms, providing high-performance data movement, ensuring data integrity, and empowering you to successfully complete data migration projects from your legacy NAS to an ANQ instance. Figure: Azure Native Qumulo’s exabyte-scale file data platform and Komprise Beyond inherent scalability and dynamic elasticity, ANQ and CNQ support enterprise-class data management features such as snapshots, replication, and quotas. ANQ and CNQ also offer multi-protocol support — NFS, SMB, FTP, and FTP-S — for file sharing and storage access. Additionally, Azure supports a wide range of protocols for various services. For authentication and authorization, it commonly uses OAuth 2.0, OpenID Connect, and SAML. For IoT, MQTT, AMQP, and HTTPS are supported for device communication. By enabling shared access to the same data via all protocols, ANQ and CNQ support collaborative and mixed-use workloads, eliminating the need to import file data into object storage. Qumulo consistently delivers low time-to-first-byte latencies of 1–2ms, offering a combined file and object platform for even the most performance-intensive AI and HPC workloads. ANQ and CNQ can run in all Azure regions (although ANQ operates best in regions with three availability zones), allowing your on-premises data centers to take advantage of Azure’s scalability, reliability, and durability. ANQ and CNQ can also be dynamically reconfigured without taking services offline, so you can adjust performance — temporarily or permanently — as workloads change. An ANQ or CNQ instance deployed initially as a disaster recovery or archive target can be converted into a high-performance data platform in seconds, without redeploying the service or migrating hosted data. If you already use Qumulo storage on-premises or in other cloud platforms, Qumulo’s Cloud Data Fabric enables seamless data movement between on-premises, edge, and Azure-based deployments. Connect portals between locations to build a Global Namespace and instantly extend your on-premises data to Azure’s portfolio of cloud-native applications, such as Microsoft Copilot, AI Studio, Microsoft Fabric, and high-performance compute and GPU services for burst rendering or various HPC engines. Cloud Data Fabric moves files through a large-scale data pipeline instantly and seamlessly. Use Qumulo’s continuous replication engine to enable disaster recovery scenarios, or combine replication with Qumulo’s cryptographically locked snapshot feature to protect older versions of critical data from loss or ransomware. ANQ and CNQ leverage Azure Blob’s 11-nines durability to achieve a highly available file system and utilizes multiple availability zones for even greater availability — without the added costs typically associated with replication in other file systems. Conclusion The future of enterprise storage isn’t just in the cloud — it’s in smart, cloud-native infrastructure that scales with your business, not against it. Azure Native Qumulo (ANQ) and Cloud Native Qumulo (CNQ) on Microsoft Azure aren’t just upgrades to legacy storage — they’re a reimagining of what file systems can do in a cloud-first world. Whether you're running AI workloads, scaling HPC environments, or simply looking to escape the limitations of aging on-prem NAS, ANQ and CNQ give you the power to do it without compromise. With elastic performance, utility-based pricing, and native integration with Azure services, Qumulo doesn’t just support modernization — it accelerates it. To help you unlock these benefits, the Qumulo team is offering a free architectural assessment tailored to your environment and workloads. If you’re ready to lead, not lag, and want to explore how ANQ and CNQ can transform your enterprise storage, reach out today by emailing Azure@qumulo.com. Let’s build the future of your data infrastructure together.510Views1like0CommentsBuilding a Scalable Web Crawling and Indexing Pipeline with Azure storage and AI Search
In the ever-evolving world of data management, keeping search indexes up-to-date with dynamic data can be challenging. Traditional approaches, such as manual or scheduled indexing, are resource-intensive, delay-prone, and difficult to scale. Azure Blob Trigger combined with an AI Search Indexer offers a cutting-edge solution to overcome these challenges, enabling real-time, scalable, and enriched data indexing. This blog explores how Blob Trigger, integrated with Azure Cognitive Search, transforms the indexing process by automating workflows and enriching data with AI capabilities. It highlights the step-by-step process of configuring Blob Storage, creating Azure Functions for triggers, and seamlessly connecting with an AI-powered search index. The approach leverages Azure's event-driven architecture, ensuring efficient and cost-effective data management.1.9KViews7likes10CommentsHolding forensic evidence: The role of hybrid cloud in successful preservation and compliance
Disclaimer: The following is a post authored by our partner Tiger Technology. Tiger Technology has been a valued partner in the Azure Storage ecosystem for many years and we are happy to have them share details on their innovative solution! Police departments worldwide are grappling with a digital explosion. From body camera footage to social media captures, the volume and variety of evidence have surged, creating a storage, and management challenge like never before. A single police department needing to store 2–5 petabytes of data—and keep some of it for 100 years. How can they preserve the integrity of this data, make it cost-effective, and ensure compliance with legal requirements? The answer lies in hybrid cloud solutions, specifically Microsoft Azure Blob Storage paired with Tiger Bridge. These solutions are empowering law enforcement to manage, and store evidence at scale, without disrupting workflows. But what exactly is hybrid cloud, and why is it a game-changer for digital evidence management? What is a hybrid cloud? A hybrid cloud combines public or private cloud services with on-premises infrastructure. It gives organizations the flexibility to mix, and match environments, allowing them to choose the best fit for specific applications, and data. This flexibility is especially valuable in highly regulated industries like law enforcement, where strict data privacy, and compliance rules govern how evidence is stored, processed, and accessed. Hybrid cloud also facilitates a smoother transition to public cloud solutions. For instance, when a data center reaches capacity, hybrid setups allow agencies to scale dynamically while maintaining control over their most sensitive data. It’s not just about storage—it's about creating a robust, compliant infrastructure for managing enormous volumes of evidence. What makes digital evidence so complex? Digital evidence encompasses any information stored or transmitted in binary form that can be used in court. It includes computer hard drives, phone records, social media posts, surveillance footage, etc. The challenge isn’t just collecting this data—it’s preserving its integrity. Forensic investigators must adhere to strict chain-of-custody protocols to prove in court that the evidence: Is authentic and unaltered, Has been securely stored with limited access, Is readily available when needed. With the surge in data volumes and complexity, traditional storage systems often fall short. That’s where hybrid cloud solutions shine, offering scalable, secure, and cost-effective options that keep digital evidence admissible while meeting compliance standards. The challenges police departments face Digital evidence is invaluable. Storing and managing it is a challenging task, and requires dealing with several aspects: Short-term storage problems The sheer scale of data can overwhelm local systems. Evidence must first be duplicated using forensic imaging to protect the original file. But housing these duplicates, especially with limited budgets, strains existing resources. Long-term retention demands In some jurisdictions, evidence must be retained for decades—sometimes up to a century. Physical storage media, like hard drives or SSDs, degrade over time and are expensive to maintain. Transitioning this data to cloud cold storage offers a more durable and cost-effective solution. Data integrity and legal admissibility Even the slightest suspicion of tampering can render evidence inadmissible. Courts require robust proof of authenticity and integrity, including cryptographic hashes and digital timestamps. Failing to maintain a clear chain of custody could jeopardize critical cases. Solving the storage puzzle with hybrid cloud For law enforcement agencies, managing sensitive evidence isn't just about storage—it's about creating a system that safeguards data integrity, ensures compliance, and keeps costs under control. Traditional methods fall short in meeting these demands as the volume of digital evidence continues to grow. This is where hybrid cloud technology stands out, offering a powerful combination of on-premises infrastructure and cloud capabilities. Microsoft Azure, a leader in cloud solutions, brings critical features to the table, ensuring evidence remains secure, accessible, and compliant with strict legal standards. But storage alone isn't enough. Efficient file management is equally crucial for managing vast datasets while maintaining workflow efficiency. Tools like Tiger Bridge complement Microsoft Azure by bridging the gap between local and cloud storage, adding intelligence and flexibility to how evidence is preserved and accessed. Microsoft Azure Blob Storage Azure Blob Storage is massively scalable and secure object storage. For the purposes of law enforcement, among other features, it offers: Automatic Tiering: Automatically moves data between hot and cold tiers, optimizing costs, Durability: Up to sixteen 9s (99.99999999999999%) of durability ensures data integrity for decades. Metadata management: Add custom tags or blob indexes, such as police case classifications, to automate retention reviews. Microsoft Azure ensures evidence is secure, accessible, and compliant with legal standards. Tiger Bridge: Smart File Management Tiger Bridge enhances Microsoft Azure’s capabilities by seamlessly integrating local and cloud storage with powerful features tailored for forensic evidence management. Tiger Bridge is a software-only solution that integrates seamlessly with Windows servers. It handles file replication, space reclaiming, and archiving—all while preserving existing workflows and ensuring data integrity and disaster recovery. With Tiger Bridge, police departments can transition to hybrid cloud storage without adding hardware or altering processes. Data replication Tiger Bridge replicates files from on-premises storage to cloud storage, ensuring a secure backup. Replication policies run transparently in the background, allowing investigators to work uninterrupted. Files are duplicated based on user-defined criteria, such as priority cases or evidence retention timelines. Space reclamation Once files are replicated to the cloud, Tiger Bridge replaces local copies with “nearline” stubs. These stubs look like the original files but take up virtually no space. When a file is needed, it’s automatically retrieved from the cloud, reducing storage strain on local servers. Data archiving For long-term storage, Tiger Bridge moves files from hot cloud tiers to cold and / or archive storage. Files in the archive tier are replaced with "offline" stubs. These files are not immediately accessible but can be manually retrieved and rehydrated when necessary. This capability allows law enforcement agencies to save on costs while still preserving access to critical evidence. Checksum for data integrity On top of strong data integrity and data protection features already built-in in Azure Storage Blob service, Tiger Bridge goes a step further in ensuring data integrity by generating checksums for newly replicated files. These cryptographic signatures allow agencies to verify that files in the cloud are identical to the originals stored on premises. This feature is essential for forensic applications, where the authenticity of evidence must withstand courtroom scrutiny. Data integrity verification is done during uploads and retrievals, ensuring that files remain unaltered while stored in the cloud. For law enforcement, checksum validation provides peace of mind, ensuring that evidence remains admissible in court and meets strict regulatory requirements Disaster Recovery In the event of a local system failure, Tiger Bridge allows for immediate recovery. All data remains accessible in the cloud, and reinstalling Tiger Bridge on a new server re-establishes access without needing to re-download files. A real-life scenario Imagine a police department dealing with petabytes of video evidence from body cameras, surveillance footage, and digital device extractions. A simple, yet effective typical real-life scenario follows the similar patterns: Investigators collect and image evidence files, Tiger Bridge replicates this data to Azure Blob Storage, following predefined rules, Active cases remain in Azure’s hot tier, while archival data moves to cost-effective cold storage, Metadata tags in Azure help automate case retention reviews, flagging files eligible for deletion. This approach ensures evidence is accessible when needed, secure from tampering, and affordable to store long-term. The results speak for themselves. Adopting a hybrid cloud strategy delivers tangible benefits: Operational efficiency: Evidence is readily accessible without the need for extensive hardware investments and maintenance. Cost savings: Automating data tiering reduces storage costs while maintaining accessibility. Workflow continuity: Investigators can maintain existing processes with minimal disruption. Enhanced compliance: Robust security measures and chain-of-custody tracking ensure legal standards are met. A future-proof solution for digital forensics As digital evidence grows in both volume and importance, police organizations must evolve their storage strategies. Hybrid cloud solutions like Azure Blob Storage and Tiger Bridge offer a path forward: scalable, secure, and cost-effective evidence management designed for the demands of modern law enforcement. The choice is clear: Preserve the integrity of justice by adopting tools built for the future. About Tiger Technology Tiger Technology helps organizations with mission-critical deployments optimize their on-premises storage and enhance their workflows through cloud services. The company is a validated ISV partner for Microsoft in three out of five Azure Storage categories: Primary and Secondary Storage; Archive, Backup and BCDR, and Data Governance, Management, and Migration. Tiger Bridge SaaS offering on Azure Marketplace is an Azure benefit-eligible, data management software enabling seamless hybrid cloud infrastructure. Installed in the customer’s on-premises or cloud environment, Tiger Bridge intelligently connects file data across file and object storage anywhere for data lifecycle management, global file access, Disaster Recovery, data migration and access to insights. Tiger Bridge supports all Azure Blob Storage tiers, including cold and archive tiers for long-term archival of data. Read more by Tiger Technology on the Tech Community Blog: Modernization through Tiger Bridge Hybrid Cloud Data Services On-premises-first hybrid workflows in healthcare. Why start with digital pathology?334Views0likes0CommentsBuilding an AI-Powered ESG Consultant Using Azure AI Services: A Case Study
In today's corporate landscape, Environmental, Social, and Governance (ESG) compliance has become increasingly important for stakeholders. To address the challenges of analyzing vast amounts of ESG data efficiently, a comprehensive AI-powered solution called ESGai has been developed. This blog explores how Azure AI services were leveraged to create a sophisticated ESG consultant for publicly listed companies. https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh The Challenge: Making Sense of Complex ESG Data Organizations face significant challenges when analyzing ESG compliance data. Manual analysis is time-consuming, prone to errors, and difficult to scale. ESGai was designed to address these pain points by creating an AI-powered virtual consultant that provides detailed insights based on publicly available ESG data. Solution Architecture: The Three-Agent System ESGai implements a sophisticated three-agent architecture, all powered by Azure's AI capabilities: Manager Agent: Breaks down complex user queries into manageable sub-questions containing specific keywords that facilitate vector search retrieval. The system prompt includes generalized document headers from the vector database for context. Worker Agent: Processes the sub-questions generated by the Manager, connects to the vector database to retrieve relevant text chunks, and provides answers to the sub-questions. Results are stored in Cosmos DB for later use. Director Agent: Consolidates the answers from the Worker agent into a comprehensive final response tailored specifically to the user's original query. It's important to note that while conceptually there are three agents, the Worker is actually a single agent that gets called multiple times - once for each sub-question generated by the Manager. Current Implementation State The current MVP implementation has several limitations that are planned for expansion: Limited Company Coverage: The vector database currently stores data for only 2 companies, with 3 documents per company (Sustainability Report, XBRL, and BRSR). Single Model Deployment: Only one GPT-4o model is currently deployed to handle all agent functions. Basic Storage Structure: The Blob container has a simple structure with a single directory. While Azure Blob storage doesn't natively support hierarchical folders, the team plans to implement virtual folders in the future. Free Tier Limitations: Due to funding constraints, the AI Search service is using the free tier, which limits vector data storage to 50MB. Simplified Vector Database: The current index stores all 6 files (3 documents × 2 companies) in a single vector database without filtering capabilities or schema definition. Azure Services Powering ESGai The implementation of ESGai leverages multiple Azure services for a robust and scalable architecture: Azure AI Services: Provides pre-built APIs, SDKs, and services that incorporate AI capabilities without requiring extensive machine learning expertise. This includes access to 62 pre-trained models for chat completions through the AI Foundry portal. Azure OpenAI: Hosts the GPT-4o model for generating responses and the Ada embedding model for vectorization. The service combines OpenAI's advanced language models with Azure's security and enterprise features. Azure AI Foundry: Serves as an integrated platform for developing, deploying, and governing generative AI applications. It offers a centralized management centre that consolidates subscription information, connected resources, access privileges, and usage quotas. Azure AI Search (formerly Cognitive Search): Provides both full-text and vector search capabilities using the OpenAI ada-002 embedding model for vectorization. It's configured with hybrid search algorithms (BM25 RRF) for optimal chunk ranking. Azure Storage Services: Utilizes Blob Storage for storing PDFs, Business Responsibility Sustainability Reports (BRSRs), and other essential documents. It integrates seamlessly with AI Search using indexers to track database changes. Cosmos DB: Employs MongoDB APIs within Cosmos DB as a NoSQL database for storing chat history between agents and users. Azure App Services: Hosts the web application using a B3-tier plan optimized for cost efficiency, with GitHub Actions integrated for continuous deployment. Project Evolution: From Concept to Deployment The development of ESGai followed a structured approach through several phases: Phase 1: Data Cleaning Extracted specific KPIs from XML/XBRL datasets and BRSR reports containing ESG data for 1,000 listed companies Cleaned and standardized data to ensure consistency and accuracy Phase 2: RAG Framework Development Implemented Retrieval-Augmented Generation (RAG) to enhance responses by dynamically fetching relevant information Created a workflow that includes query processing, data retrieval, and response generation Phase 3: Initial Deployment Deployed models locally using Docker and n8n automation tools for testing Identified the need for more scalable web services Phase 4: Transition to Azure Services Migrated automation workflows from n8n to Azure AI Foundry services Leveraged Azure's comprehensive suite of AI services, storage solutions, and app hosting capabilities Technical Implementation Details Model Configurations: The GPT model is configured with: Model version: 2024-11-20 Temperature: 0.7 Max Response Token: 800 Past Messages: 10 Top-p: 0.95 Frequency/Presence Penalties: 0 The embedding model uses OpenAI-text-embedding-Ada-002 with 1536 dimensions and hybrid semantic search (BM25 RRF) algorithms. Cost Analysis and Efficiency A detailed cost breakdown per user query reveals: App Server: $390-400 AI Search: $5 per query RAG Query Processing: $4.76 per query Agent-specific costs: Manager: $0.05 (30 input tokens, 210 output tokens) Worker: $3.71 (1500 input tokens, 1500 output tokens) Director: $1.00 (600 input tokens, 600 output tokens) Challenges and Solutions The team faced several challenges during implementation: Quota Limitations: Initial deployments encountered token quota restrictions, which were resolved through Azure support requests (typically granted within 24 hours). Cost Optimization: High costs associated with vectorization required careful monitoring. The team addressed this by shutting down unused services and deploying on services with free tiers. Integration Issues: GitHub Actions raised errors during deployment, which were resolved using GitHub's App Service Build Service. Azure UI Complexity: The team noted that Azure AI service naming conventions were sometimes confusing, as the same name is used for both parent and child resources. Free Tier Constraints: The AI Search service's free tier limitation of 50MB for vector data storage restricts the amount of company information that can be included in the current implementation. Future Roadmap The current implementation is an MVP with several areas for expansion: Expand the database to include more publicly available sustainability reports beyond the current two companies Optimize token usage by refining query handling processes Research alternative embedding models to reduce costs while maintaining accuracy Implement a more structured storage system with virtual folders in Blob storage Upgrade from the free tier of AI Search to support larger data volumes Develop a proper schema for the vector database to enable filtering and more targeted searches Scale to multiple GPT model deployments for improved performance and redundancy Conclusion ESGai demonstrates how advanced AI techniques like Retrieval-Augmented Generation can transform data-intensive domains such as ESG consulting. By leveraging Azure's comprehensive suite of AI services alongside a robust agent-based architecture, this solution provides users with actionable insights while maintaining scalability and cost efficiency. https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh223Views0likes0CommentsHow to Automate Cross-OS File Fixes with Azure Automation and PowerShell
Build a serverless file fixer in Azure using Automation, PowerShell, Blob Storage, and Event Grid. Learn how to set up the necessary resources, configure permissions, and automatically detect and correct cross-OS file issues—such as CRLF vs LF line endings and file permission mismatches. This streamlined approach saves time and eliminates manual fixes, ensuring smoother, error-free workflows for developers working across different operating systems.226Views0likes0Comments