ai
5 Topics- Beyond Basics: Practical scenarios with Azure Storage ActionsIf you are new to Azure Storage Actions, check out our GA announcement blog for an introduction. This post is for cloud architects, data engineers, and IT admins who want to automate and optimize data governance at scale. The Challenge: Modern Data Management at Scale As organizations generate more data than ever, managing that data efficiently and securely is a growing challenge. Manual scripts, periodic audits, and ad-hoc cleanups can’t keep up with the scale, complexity, and compliance demands of today’s cloud workloads. Teams need automation that’s reliable, scalable, and easy to maintain. Azure Storage Actions delivers on this need by enabling policy-driven automation for your storage accounts. With Storage Actions, you can: Automate compliance (e.g., legal holds, retention) Optimize storage costs (e.g., auto-tiering, expiry) Reduce operational overhead (no more custom cleanup scripts) Improve data discoverability (tagging, labeling) Real-World Scenarios: Unlocking the Power of Storage Actions Let’s explore 3 practical scenarios where Storage Actions can transform customers’ data management approach. For each, we’ll look at the business problem, the traditional approach, and how Storage Actions makes it easier with the exact conditions and operations which can be used. Scenario 1: Content Lifecycle for Brand Teams Business Problem: Brand and marketing teams manage large volumes of creative assets - videos, design files, campaign materials that evolve through multiple stages and often carry licensing restrictions. These assets need to be retained, frozen, or archived based on their lifecycle and usage rights. Traditionally, teams rely on scripts or manual workflows to manage this, which can be error-prone, slow, and difficult to scale. How Storage Actions Helps: Azure Storage Actions enables brand teams to automate the content lifecycle management using blob metadata and / or index tag. With a single task definition using an IF and ELSE structure, teams can apply different operations to blobs based on their stage, licensing status, and age without writing or maintaining scripts. Example in Practice: Let’s say a brand team manages thousands of creative assets videos, design files, campaign materials each tagged with blob metadata that reflects its lifecycle stage and licensing status. For instance: Assets that are ready for public use are tagged with asset-stage = final Licensed or restricted-use content is tagged with usage-rights = restricted Over time, these assets accumulate in your storage account, and you need a way to: Ensure that licensed content is protected from accidental deletion or modification Archive older final assets to reduce storage costs Apply these rules automatically, without relying on scripts or manual reviews With Azure Storage Actions, the team can define a single task that evaluates each blob and applies the appropriate operation using a simple IF and ELSE structure: IF: - Metadata.Value["asset-stage"] equals "final" - AND Metadata.Value["usage-rights"] equals “restricted” - AND creationTime < 60d THEN: - SetBlobLegalHold: This locks the blob to prevent deletion or modification, ensuring compliance with licensing agreements. - SetBlobTier to Archive: This moves the blob to the Archive tier, significantly reducing storage costs for older content that is rarely accessed. ELSE - SetBlobTier to Cool: If the blob does not meet the above criteria whether it’s a draft, unlicensed, or recently created, it is moved to the Cool tier. Once this Storage Action is created and assigned to a storage account, it is scheduled to run automatically every week. During each scheduled run, the task evaluates every blob in the target container or account. For each blob, it checks if the asset is marked as final, tagged with usage-rights, and older than 60 days. If all these conditions are met, the blob is locked with a legal hold to prevent accidental deletion and then archived to optimize storage costs. If the blob does not meet all of these criteria, it is moved to the Cool tier, ensuring it remains accessible but stored more economically. This weekly automation ensures that every asset is managed appropriately based on its metadata, without requiring manual intervention or custom scripts. Scenario 2: Audit-Proof Model Training Business Problem: In machine learning workflows, ensuring the integrity and reproducibility of training data is critical especially when models influence regulated decisions in sectors like automotive, finance, healthcare, or legal compliance. Months or even years after a model is deployed, auditors or regulators may request proof that the training data used has not been altered since the model was built. Traditionally, teams try to preserve training datasets by duplicating them into backup storage, applying naming conventions, and manually restricting access. These methods are error-prone, hard to enforce at scale, and lack auditability. How Storage Actions Helps: Storage Actions enables teams to automate the preservation of validated training datasets using blob tags and immutability policies. Once a dataset is marked as clean and ready for training, Storage Actions can automatically: Lock the dataset using a time-based immutability policy Apply a tag to indicate it is a snapshot version This ensures that the dataset cannot be modified or deleted for the duration of the lock, and it is easily discoverable for future audits. Example in Practice: Let’s say an ML data pipeline tags a dataset with stage=clean after it passes validation and is ready for training. Storage Actions detects this tag and springs into action. It enforces a 1-year immutability policy, which means the dataset is locked and cannot be modified or deleted for the next 12 months. It also applies a tag snapshot=true, making it easy to locate and reference in future audits or investigations. The following conditions and operations define the task logic: IF: - Tags.Value[stage] equals 'clean' THEN: - SetBlobImmutabilityPolicy for 1-year: This adds a write once, read many (WORM) immutability policy on the blob to prevent deletion or modification, ensuring compliance. - SetBlobTags with snapshot=true: This adds a blob index tag with name “snapshot” and value “true”. Whenever this task runs on its scheduled interval - such as daily or weekly, it detects if a blob has the tag stage = 'clean', it automatically initiates the configured operations. In this case, Storage Actions applies a SetBlobImmutabilityPolicy on the blob for one year and adds a snapshot=true tag for easy identification. This means that without any manual intervention: The blob is made immutable for 12 months, preventing any modifications or deletions during that period. A snapshot=true tag is applied, making it easy to locate and audit later. No scripts, manual tagging, or access restrictions are needed to enforce data integrity. This ensures that validated training datasets are preserved in a tamper-proof state, satisfying audit and compliance requirements. It also reduces operational overhead by automating what would otherwise be a complex and error-prone manual process. Scenario 3: Embedding Management in AI Workflows Business Problem: Modern AI systems, especially those using Retrieval-Augmented Generation (RAG), rely heavily on vector embeddings to represent and retrieve relevant context from large document stores. These embeddings are often generated in real time, chunked into small files, and stored in vector databases or blob storage. As usage scales, these systems generate millions of small embedding files, many of which become obsolete quickly due to frequent updates, re-indexing, or model version changes. This silent accumulation of stale embeddings leads to: Increased storage costs Slower retrieval performance Operational complexity in managing the timings Traditionally, teams write scripts to purge old embeddings based on timestamps, run scheduled jobs, and manually monitor usage. This approach is brittle and does not scale well. How Storage Actions Helps: Storage Actions enables customers to automate the management of embeddings using blob tags and metadata. With blobs being identified with tags and metadata such as embeddings=true, modelVersion=latest, customers can define conditions that automatically delete stale embeddings without writing custom scripts. Example in Practice: In production RAG systems, embeddings are frequently regenerated to reflect updated content, new model versions, or refined chunking strategies. For example, a customer support chatbot may re-index its knowledge base daily to ensure responses are grounded in the latest documentation. To avoid bloating storage with outdated vector embeddings, Storage Actions can automate cleanup with task conditions and operation such as: IF: - Tags.Value[embeddings] equals 'true' - AND NOT Tags.Value[version] equals ‘latest’ - AND creation time < 12 days ago THEN: - DeleteBlob: This deletes all blobs which match the IF condition criteria. Whenever this Storage Action runs on its scheduled interval - such as daily - it scans for blobs that have the tag embeddings = ‘true’ and is not the latest version with its age being more than 12 days old, it automatically initiates the configured operation. In this case, Storage Actions does a DeleteBlob operation on the blob. This means that without any manual intervention: The stale embeddings are deleted No scripts or scheduled jobs are needed to track. This ensures that only the most recent model’s embeddings are retained, keeping the vector store lean and performant. It also reduces storage costs by eliminating obsolete data and helps maintain retrieval accuracy by ensuring outdated embeddings do not interfere with current queries. Applying Storage Actions to Storage Accounts To apply any of the scenarios, customers create an assignment during the storage task resource creation. In the assignment creation flow, they select the appropriate role and configure filters and trigger details. For example, a compliance cleanup scenario might run across the entire storage account with a recurring schedule every seven days to remove non-compliant blobs. A cost optimization scenario could target a specific container using a blob prefix and run as a one-time task to archive older blobs. A bulk tag update scenario would typically apply to all blobs without filtering and use a recurring schedule to keep tags consistent. After setting start and end dates, specifying the export container, and enabling the task, clicking Add queues the action to run on the account. Learn More If you are interested in exploring Storage Actions further, there are several resources to help you get started and deepen your understanding: Documentation on Getting Started: https://learn.microsoft.com/en-us/azure/storage-actions/storage-tasks/storage-task-quickstart-portal Create a Storage Action from the Azure Portal: https://portal.azure.com/#create/Microsoft.StorageTask Azure Storage Actions pricing: https://azure.microsoft.com/en-us/pricing/details/storage-actions/#pricing Azure Blog about the GA announcement: https://azure.microsoft.com/en-us/blog/unlock-seamless-data-management-with-azure-storage-actions-now-generally-available/ Azure Skilling Video with a walkthrough of Storage Actions: https://www.youtube.com/watch?v=CNdMFhdiNo8 Have questions, feedback, or a scenario to share? Drop a comment below or reach out to us at storageactions@microsoft.com. We would love to hear how you are using Storage Actions and what scenarios you would like to see next!377Views1like0Comments
- How Microsoft Azure and Qumulo Deliver a Truly Cloud-Native File System for the EnterpriseDisclaimer: The following is a post authored by our partner Qumulo. Qumulo has been a valued partner in the Azure Storage ecosystem for many years and we are happy to share details on their unique approach to solving challenges of scalable filesystems! Whether you’re training massive AI models, running HPC simulations in life sciences, or managing unstructured media archives at scale, performance is everything. Qumulo and Microsoft Azure deliver the cloud-native file system built to handle the most data-intensive workloads, with the speed, scalability, and simplicity today's innovators demand. But supporting modern workloads at scale is only part of the equation. Qumulo and Microsoft have resolved one of the most entrenched and difficult challenges in modernizing the enterprise data estate: empowering file data with high performance across a global workforce without impacting the economics of unstructured data storage. According to Gartner, global end-user spending on public cloud services is set to surpass $1 trillion by 2027. That staggering figure reflects more than just a shift in IT budgets—it signals a high-stakes race for relevance. CIOs, CTOs, and other tech-savvy execs are under relentless pressure to deliver the capabilities that keep businesses profitable and competitive. Whether they’re ready or not, the mandate is clear: modernize fast enough to keep up with disruptors, many of whom are using AI and lean teams to move at lightning speed. To put it simply, grow margins without getting outpaced by a two-person startup using AI in a garage. That’s the challenge leaders face every day. Established enterprises must contend with the duality of maintaining successful existing operations and the potential disruption to those operations by a more agile business model that offers insight into the next wave of customer desires and needs. Nevertheless, established enterprises have a winning move - unleash the latent productivity increases and decision-making power hidden within years, if not decades, worth of data. Thoughtful CIOs, CTOs, and CXOs have elected to move slowly in these areas due to the tyranny of quarterly results and the risk of short-term costs reflecting poorly on the present at the expense of the future. In this sense, adopting innovative technologies forced organizations to choose between self-disruption with long-term benefits or non-disruptive technologies with long-term disruption risk. When it comes to network-attached storage, CXOs were forced to accept non-disruptive technologies because the risk was too high. This trade-off is no longer required. Microsoft and Qumulo have addressed this challenge in the realm of unstructured file data technologies by delivering a cloud-native architecture that combines proven Azure primitives with Qumulo’s suite of file storage solutions. Now, those patient CXOs, waiting to adopt hardened technologies, can shift their file data paradigm into Azure while improving business value, data portability, and reducing the financial burden on their business units. Today, organizations that range from 50,000+ employees with global offices, to organizations with a few dozen employees with unstructured data-centric operations have discovered the incredible performance increases, data availability, accessibility, and economic savings realized when file data moves into Azure using one of two Qumulo solutions: Option 1 — Azure Native Qumulo (ANQ) is a fully managed file service that delivers truly elastic capacity, throughput, and IOPS, along with all the enterprise features of your on-premises NAS and a TCO to match. Option 2 — Cloud Native Qumulo (CNQ) on Microsoft Azure is a self-hosted file data service that offers the performance and scale your most demanding workloads require, at a comparable total cost of ownership to on-premises storage. Both CNQ on Microsoft Azure and ANQ offer the flexibility and capacity of object storage while remaining fully compatible with file-based workflows. As data platforms purpose-built for the cloud, CNQ and ANQ provide three key characteristics: Elasticity — Performance and capacity can scale independently, both up and down, dynamically. Boundless Scale — Virtually no limitations on file system size or file count, with full multi-protocol support. Utility-Based Pricing — Like Microsoft Azure, Qumulo operates on a pay-as-you-go model, charging only for resources used without requiring pre-provisioned capacity or performance. The collaboration between Qumulo’s cloud-native file solutions and the Microsoft Azure ecosystem enables seamless migration of a wide range of workflows, from large-scale archives to high-performance computing (HPC) applications, from on-premises environments to the cloud. For example, a healthcare organization running a fully cloud-hosted Picture Archiving and Communication System (PACS) alongside a Vendor Neutral Archive (VNA) can leverage Cloud Native Qumulo (CNQ) to manage medical imaging data in Azure. CNQ offers a HIPAA-compliant, highly durable, and cost-efficient platform for storing both active and infrequently accessed diagnostic images, enabling secure access while optimizing storage costs. With Azure’s robust cloud infrastructure, organizations can design a cloud file solution that scales to meet virtually any size or performance requirement, while unlocking new possibilities in cloud-based AI and HPC workloads. Further, using the Qumulo Cloud Data Fabric, the enterprise is able to connect geographically separated data sources within one unified, strictly consistent (POSIX-compliant), secure, and high-performance file system. As organizational needs evolve — whether new workloads are added or existing workloads expand — Cloud Native Qumulo or Azure Native Qumulo can easily scale to meet performance demands while maintaining the predictable economics that meet existing or shrinking budgets. About Azure Native Qumulo and Cloud Native Qumulo on Azure Azure Native Qumulo (ANQ) and Cloud Native Qumulo (CNQ) enable organizations to leverage a fully customizable, multi-protocol solution that dynamically scales to meet workload performance requirements. Engineered specifically for the cloud, ANQ is designed for simplicity of operation and automatic scalability as a fully managed service. CNQ offers the same great technology, directly leveraging cloud-native resources like Azure Virtual Machines (VMs), Azure Networking, and Azure Blob Storage to provide a scalable platform that adapts to the evolving needs of today’s workloads – but deploys entirely in the enterprise tenant, allows for direct control over the underlying infrastructure, and requires a little bit higher level of internal expertise to operate. Azure Native Qumulo and Cloud Native Qumulo on Azure also deliver a fully dynamic file storage platform that is natively integrated with the Microsoft Azure backend. Here’s what sets ANQ and CNQ apart: Elastic Scalability — Each ANQ and CNQ instance on Azure Blob Storage can automatically scale to exabyte-level storage within a single namespace by simply adding data. On Microsoft Azure, performance adjustments are straightforward: just add or remove compute instances to instantly boost throughput or IOPS, all without disruption and within minutes. Plus, you pay only for the capacity and compute resources you use. Deployed in Minutes — ANQ deploys from the Azure Portal, CLI, or PowerShell, just like a native service. CNQ runs in your own Azure virtual network and can be deployed via Terraform. You can select the compute type that best matches your workload’s performance requirements and build a complete file data platform on Azure in under six minutes for a three-node cluster. Automatic TCO Management — can be facilitated through services like Komprise Intelligent Tiering for Azure and Azure Blob Storage access tiers. It optimizes storage costs and manages data lifecycle. By analyzing data access patterns, these systems move files or objects to appropriate tiers, reducing costs for infrequently accessed data. Additionally, all data written to CNQ is compressed to ensure maximum cost efficiency. ANQ automatically adapts to your workload requirements, and CNQ’s fully customizable architecture can be configured to meet the specific throughput and IOPS requirements of virtually any file or object-based workload. You can purchase either ANQ or CNQ through a pay-as-you-go model, eliminating the need to pre-provision cloud file services. Simply pay for what you use. ANQ and CNQ deliver comparable performance and services to on-premises file storage at a similar TCO. Qumulo’s cloud-native architecture redefines cloud storage by decoupling capacity from performance, allowing both to be adjusted independently and on demand. This provides the flexibility to modify components such as compute instance type, compute instance count, and cache disk capacity — enabling rapid, non-disruptive performance adjustments. This architecture, which includes the innovative Predictive Cache, delivers exceptional elasticity and virtually unlimited capacity. It ensures that businesses can efficiently manage and scale their data storage as their needs evolve, without compromising performance or reliability. ANQ and CNQ retain all the core Qumulo functionalities — including real-time analytics, robust data protection, security, and global collaboration. Example architecture In the example architecture, we see a solution that uses Komprise to migrate file data from third-party NAS systems to ANQ. Komprise provides platform-agnostic file migration services at massive scale in heterogeneous NAS environments. This solution facilitates the seamless migration of file data between mixed storage platforms, providing high-performance data movement, ensuring data integrity, and empowering you to successfully complete data migration projects from your legacy NAS to an ANQ instance. Figure: Azure Native Qumulo’s exabyte-scale file data platform and Komprise Beyond inherent scalability and dynamic elasticity, ANQ and CNQ support enterprise-class data management features such as snapshots, replication, and quotas. ANQ and CNQ also offer multi-protocol support — NFS, SMB, FTP, and FTP-S — for file sharing and storage access. Additionally, Azure supports a wide range of protocols for various services. For authentication and authorization, it commonly uses OAuth 2.0, OpenID Connect, and SAML. For IoT, MQTT, AMQP, and HTTPS are supported for device communication. By enabling shared access to the same data via all protocols, ANQ and CNQ support collaborative and mixed-use workloads, eliminating the need to import file data into object storage. Qumulo consistently delivers low time-to-first-byte latencies of 1–2ms, offering a combined file and object platform for even the most performance-intensive AI and HPC workloads. ANQ and CNQ can run in all Azure regions (although ANQ operates best in regions with three availability zones), allowing your on-premises data centers to take advantage of Azure’s scalability, reliability, and durability. ANQ and CNQ can also be dynamically reconfigured without taking services offline, so you can adjust performance — temporarily or permanently — as workloads change. An ANQ or CNQ instance deployed initially as a disaster recovery or archive target can be converted into a high-performance data platform in seconds, without redeploying the service or migrating hosted data. If you already use Qumulo storage on-premises or in other cloud platforms, Qumulo’s Cloud Data Fabric enables seamless data movement between on-premises, edge, and Azure-based deployments. Connect portals between locations to build a Global Namespace and instantly extend your on-premises data to Azure’s portfolio of cloud-native applications, such as Microsoft Copilot, AI Studio, Microsoft Fabric, and high-performance compute and GPU services for burst rendering or various HPC engines. Cloud Data Fabric moves files through a large-scale data pipeline instantly and seamlessly. Use Qumulo’s continuous replication engine to enable disaster recovery scenarios, or combine replication with Qumulo’s cryptographically locked snapshot feature to protect older versions of critical data from loss or ransomware. ANQ and CNQ leverage Azure Blob’s 11-nines durability to achieve a highly available file system and utilizes multiple availability zones for even greater availability — without the added costs typically associated with replication in other file systems. Conclusion The future of enterprise storage isn’t just in the cloud — it’s in smart, cloud-native infrastructure that scales with your business, not against it. Azure Native Qumulo (ANQ) and Cloud Native Qumulo (CNQ) on Microsoft Azure aren’t just upgrades to legacy storage — they’re a reimagining of what file systems can do in a cloud-first world. Whether you're running AI workloads, scaling HPC environments, or simply looking to escape the limitations of aging on-prem NAS, ANQ and CNQ give you the power to do it without compromise. With elastic performance, utility-based pricing, and native integration with Azure services, Qumulo doesn’t just support modernization — it accelerates it. To help you unlock these benefits, the Qumulo team is offering a free architectural assessment tailored to your environment and workloads. If you’re ready to lead, not lag, and want to explore how ANQ and CNQ can transform your enterprise storage, reach out today by emailing Azure@qumulo.com. Let’s build the future of your data infrastructure together.555Views1like0Comments
- Building a Scalable Web Crawling and Indexing Pipeline with Azure storage and AI SearchIn the ever-evolving world of data management, keeping search indexes up-to-date with dynamic data can be challenging. Traditional approaches, such as manual or scheduled indexing, are resource-intensive, delay-prone, and difficult to scale. Azure Blob Trigger combined with an AI Search Indexer offers a cutting-edge solution to overcome these challenges, enabling real-time, scalable, and enriched data indexing. This blog explores how Blob Trigger, integrated with Azure Cognitive Search, transforms the indexing process by automating workflows and enriching data with AI capabilities. It highlights the step-by-step process of configuring Blob Storage, creating Azure Functions for triggers, and seamlessly connecting with an AI-powered search index. The approach leverages Azure's event-driven architecture, ensuring efficient and cost-effective data management.2KViews7likes10Comments
- Dell APEX File Storage for Microsoft Azure brings a powerful new option to our customersDell PowerScale OneFS has been trusted by customers across all industries to provide performant, resilient, and scalable multiprotocol file storage for nearly two decades. At Ignite 2024 we announced a new Dell managed variant that compliments the existing offering to give you powerful choice from a proven industry leader.892Views0likes0Comments