Blog Post

Azure Storage Blog
7 MIN READ

Breaking the Barrier in Azure: 1TB/s File System Throughput with Qumulo

dukicn's avatar
dukicn
Icon for Microsoft rankMicrosoft
Jan 09, 2025

Elastic, durable, and predictable: Cost-effective 1TB/s file system throughput

Disclaimer: The following post has been co-authored by our ISV, Qumulo. Qumulo provides cloud and hybrid file data platforms. Qumulo on Azure is trusted by Fortune 500 companies and global enterprises to manage petabytes of data. For more information, visit www.qumulo.com.

With Qumulo and Microsoft Azure, transferring a terabyte of data in a single second becomes a reality.

Today, amidst the explosion of digital information, cloud computing, and AI, the ability to access data at the ever-increasing speed of business becomes a competitive advantage. That said, speed alone isn’t the solution. Speed must combine with the most advanced cloud applications and AI tools unlocking smarter decisions, improving workforce efficiency, and making it possible in minutes what previously required days or months.

Data acceleration combined with today’s latest tools will be what drives your next wave of innovation, cost savings, and top line revenue.

For example, 1 TB/s throughput means you can:

  • Analyze about 1,400 human genome models simultaneously, reducing the time to find new cures and treatments. (The human genome is roughly 725 MB.)
  • Process the entire volume of NYSE contracts created every day in 6.7 minutes, leading to more accurate forecasts, protecting pensions and investments. (NYSE is estimated to create 402.74 TB of contract data daily.)
  • Collaborate with your colleagues around the world on complex project files without experiencing latency or conflicts. (Imagine a document with 50,000 concurrent editors!)
  • Build new AI models using your own data without specialty configurations, expensive project consultants, or copying data into a separate platform.

*Workload and sizing data provided by Bing.

With Azure and Qumulo, you can do all of this and so much more using Cloud Native Qumulo (CNQ, a PaaS offering) or Azure Native Qumulo (ANQ, a SaaS offering). The best part is that these high-performance scenarios do not break the bank.

Both CNQ and ANQ services are enterprise-grade solutions offering a single namespace with robust features such as: snapshots, file locking, replication, a global namespace, and seamless multi-protocol capabilities. CNQ and ANQ both support native simultaneous SMB and NFS performance. And unlike any other solution available, CNQ and ANQ are both non-disruptively elastic – meaning you can scale performance up or down dynamically, independent of storage capacity (which can also scale up or down). You only pay for what you use when you use it, and you stop paying when you delete data or reduce the performance requirements. These innovations combine to make the Qumulo platform in Azure an ideal foundation for the data-intensive demands of the AI revolution.

Meeting the Throughput Challenge

With the advancements in high performance virtual machines and ultra-fast networking, it is paramount that the flow of data from storage does not become a bottleneck.

Qumulo's performance capabilities in Azure, including the ability to stream massive amounts of data with unparalleled throughput, provides a distinct advantage to business on the forefront of value generation with data models. 

Here are some active examples:

Use Case Benefit

Media and Entertainment

Edit and render 8K video files in real-time, cutting production timelines, and empowering a global VFX talent and artist community, ultimately enhancing viewer experiences.

Scientific Research

Transfer and analyze petabyte-scale datasets like genome sequences or physics simulations with unprecedented speed

AI/ML

Train and deploy models faster by eliminating storage I/O bottlenecks, accelerating the development and deployment of sophisticated machine learning models.

Financial Modeling

Perform real-time market analysis, risk evaluations, and optimizations, ensuring timely decision-making.

Uninterrupted Operations

Ensures business continuity with near-instantaneous recovery times for backup and disaster recovery, minimizing downtime and protecting revenue streams.

Global Collaboration

Augments global collaboration by enabling seamless sharing of voluminous datasets across geographical boundaries, fostering productivity and real-time co-creation.

Inside the Architecture: Bringing the Solution to Life

Delivering 1TB/s throughput in Azure utilized Qumulo’s cloud native architecture, which is shared by both ANQ and CNQ products. The description below outlines the configuration and fine tuning required to achieve this performance; however, these customizations are available to any customer requiring this level of support. To learn more about the logical architecture of ANQ and CNQ inside Azure, visit azure.qumulo.com.

Key Components:

Networking: A single Virtual Network (VNet) forms the foundation, hosting two distinct subnets: one dedicated to the load generators (workers) responsible for driving the data flow, and another housing the CNQ cluster itself. This segregation optimizes network traffic and enhances performance. To further maximize throughput, we ensured all network interfaces leverage Azure's advanced networking features.

Compute: Virtual machine (VM) scale sets provide dynamic scalability for the load generator workers, adapting to fluctuating demands with agility. The CNQ cluster leverages high-performance VMs equipped with Premium SSDv2 for write-caching. CNQ enables customers to select the appropriate VM type which allows for the aggregate egress bandwidth of all VMs had to support 1TB/s.

Storage: Beyond the high-performance SSDs used for write-caching, the architecture utilizes Azure Blob storage for persistent data. CNQ capitalizes on Azure Blob storage's scalability, durability, and cost-effectiveness, providing a robust persistence layer for managing vast datasets beyond the read or write cache layers.

Security: Azure Key Vault safeguards sensitive credentials, while Network Security Groups (NSGs) act as the gatekeepers, restricting traffic and bolstering the overall security posture.  We streamlined security measures by removing unnecessary blockers, e.g., local firewalls and all security software, in this specific benchmarking environment. These security measures can be re-inserted for production environments.

Fine-Tuning the NAS Clients for Peak Performance

Beyond selecting the right components, we further optimized every aspect of the architecture to push the upper limits of non-specialized Azure components, including the Ubuntu Linux clients that were driving the throughput demand. Our goal was to prove these performance numbers are achievable without special SKUs or technologies. The client-side optimizations included:

  • Block size optimization ensured efficient data transfer and minimized overhead,
  • Unnecessary OS packages were removed to reduce overhead and streamline the operating system,
  • User data scripts automated the configuration of load generators, ensuring speed and consistency.

This approach unlocked unprecedented performance levels within Azure's infrastructure. 

Cloud Native Qumulo (Azure) | Architecture Diagram

Qumulo's performance dashboard

This architecture is inherently designed for scalability and resilience. It can dynamically adapt to fluctuating workload demands, ensuring consistent performance and eliminating downtime, even during peak usage. We set our sights on 1 TB/s as a benchmark, but this system is designed to scale far beyond that. CNQ is capable of delivering multiple TB/s, to meet the demands of even the most ambitious workloads.  

The Strategic Advantage

Customers using Qumulo on Azure benefit from these critical advantages when compared to workloads using other architecture:

  • Real-Time Processing: Process up to a terabyte of data in a single second, eliminating performance bottlenecks that hinder innovation.
  • Unmatched Scalability: Adapt seamlessly to fluctuating workloads, ensuring consistent performance and eliminating downtime, even during peak demand.
  • Optimized Cost Efficiency: Leverage Azure's flexible pricing models optimize costs without compromising performance, achieving an ideal balance between efficiency and expenditure.

Collectively, our goal is to eliminate the need to choose between performance and economics - Cloud Native Qumulo and Azure Native Qumulo are designed to deliver both.

Cost and Efficiency Insights

By strategically utilizing VM Scale Sets and premium storage, the architecture dynamically adapts to workload fluctuations, minimizing idle costs and optimizing resource utilization. To illustrate the cost-effectiveness of this solution, here's a breakdown of Azure components and the costs at the time of testing:

Category

Azure Component

Purpose

Qty

Unit Cost

Hourly Rate

Compute

Virtual Machine Scale Set (120 VMs)

Load Generator VMs (NFS)

120

$5.88

$705.60

Compute

Virtual Machines (VMs)

CNQ Cluster Nodes

120

$5.88

$705.60

Network

Virtual Network (vNet)

Primary Network

1

$0

$0.00

Network

Data Transfer

Data transfer out

1

$3.84

$3.84

Network

Azure Private DNS Zone

Internal DNS resolution

1

$0.249

$0.25

Storage

Storage Accounts

CNQ Persistent Data

4

$0

$0.00

Storage

"Utility" Blob Storage Container

Hosts CNQ images/files

1

$0.03

$0.03

Storage

Premium (V2) SSD LRS

Boot & CNQ write-cache

240

$0.0003

$0.73

Security

Azure Key Vault

Secure credentials and sensitive data storage.

1

$0.01

$0.01

Security Network Security Groups (NSGs)

Secures access and restricts inbound traffic.

4

$0.00

$0.00

 Total

 

 

 

 $1,416.06

This flexible and scalable approach ensures that organizations of all sizes can achieve unparalleled throughput without the financial burden and rigidity of traditional infrastructure.

The nearest non-Azure based comparison performance and cost structure is available in self-managed Lustre, which results in a total hourly rate of $3,441/hr.

Category Azure Component Purpose Qty Unit Cost Hourly Rate
Compute Virtual Machines (HB176-48rs_v4) Lustre MDS, MDT, OSS, OST, MGS, MGT 256 $8.42 $2,155.52
Compute Virtual Machines (HB176-48rs_v4) Linux Clients 80 $8.42 $673.60
Network Virtual Network (vNet) Primary Network 1 $0.00 $0.00
Network Data Transfer Data transfer out 1 $3.84 $3.84
Storage Premium (V2) SSD LRS Lustre OSS Disks 950 $0.64 $608.00
Total         $3,440.96

While these estimates provide a general understanding of the hourly costs, it's important to note that actual expenses will vary based on specific workloads, Azure region, and individual customer requirements. By leveraging Azure's granular pricing models and diverse service offerings, organizations can efficiently achieve exceptional throughput with predictable and manageable expenses.

The linearity of cost-to-performance is highly predictable, which gives customers the ability to accurately predict business outcomes and expenses. We’ve worked to demystify the complexity of cloud pricing and configuration performance, resulting in greater value for business decision makers and technical decision makers alike. (Costs shown reflect CNQ system costs, not client/application VMs.)

Beyond 1TB/s: The Future of Data Velocity

Building a high-performance data acceleration and storage platform gives you a launchpad for future innovations. The architecture provides a robust foundation for advancements such as:

Innovation

Description

Enhanced Resilience and Availability

Implementing multi-region configurations to further enhance disaster recovery capabilities and ensure high availability for mission-critical applications.

Simplified Global Workflows

Streamlining operational complexity for data-intensive workflows spanning geographically dispersed locations, enabling seamless global collaboration.

Sustainable Technology

Minimizing environmental impact by leveraging Azure's energy-efficient cloud solutions, aligning technology goals with sustainability initiatives.

Qumulo is committed to continuous innovation in high-performance cloud storage, ensuring our customers remain at the forefront of data-driven advancements.

Call to Action

By incorporating ANQ, or CNQ into your data estate, you can future proof your platform performance requirements with predictable economics. You can launch your ANQ environment direct from the Azure Portal or setup a CNQ POC by contacting azure@qumulo.com or ordering through the Azure Marketplace.  

Schedule a free consultation today and discover how to achieve unparalleled data velocity with CNQ on Azure.

Updated Jan 14, 2025
Version 2.0
No CommentsBe the first to comment