Blog Post

Azure High Performance Computing (HPC) Blog
3 MIN READ

Azure CycleCloud 8.8 and CCWS 1.2 at SC25 and Ignite

anhoward's avatar
anhoward
Icon for Microsoft rankMicrosoft
Nov 17, 2025

Azure CycleCloud continues to evolve as the backbone for orchestrating high-performance computing (HPC) and AI workloads in the cloud. With the release of CycleCloud 8.8, users gain access to a suite of new features designed to streamline cluster management, enhance health monitoring, and future-proof their HPC environments.

Azure CycleCloud 8.8: Advancing HPC & AI Workloads with Smarter Health Checks

Azure CycleCloud continues to evolve as the backbone for orchestrating high-performance computing (HPC) and AI workloads in the cloud. With the release of CycleCloud 8.8, users gain access to a suite of new features designed to streamline cluster management, enhance health monitoring, and future-proof their HPC environments.

Key Features in CycleCloud 8.8

1. ARM64 HPC Support

The platform expands its hardware compatibility with ARM64 HPC support, opening new possibilities for energy-efficient and cost-effective compute clusters. This includes access to the newer generation of GB200 VMs as well as general ARM64 support, enabling new AI workloads at a scale never possible before

2. Slurm Topology-Aware Scheduling

The integration of topology-aware scheduling for Slurm clusters allows CycleCloud users to optimize job placement based on network and hardware topology. This leads to improved performance for tightly coupled HPC workloads and better utilization of available resources.

3. Nvidia MNNVL and IMEX Support

With expanded support for Nvidia MNNVL and IMEX, CycleCloud 8.8 ensures compatibility with the latest GPU technologies. This enables users to leverage cutting-edge hardware for AI training, inference, and scientific simulations.

4. HealthAgent: Event-Driven Health Monitoring and Alerting

A standout feature in this release is the enhanced HealthAgent, which delivers event-driven health monitoring and alerting. CycleCloud now proactively detects issues across clusters, nodes, and interconnects, providing real-time notifications and actionable insights. This improvement is a game-changer for maintaining uptime and reliability in large-scale HPC deployments. Node Healthagent supports both impactful healthchecks which can only run while nodes are idle as well as non-impactful healthchecks that can run throughout the lifecycle of a job. This allows CycleCloud to alert on issues that not only happen while nodes are starting, but also issues that may result from failures for long-running nodes. 

Later releases of CycleCloud will also include automatic remediation for common failures, so stay tuned!

5. Enterprise Linux 9 and Ubuntu 20.24 support

One common request has been wider support for the various Enterprise Linux (EL) 9 variants, including RHEL9, AlmaLinux 9, and Rocky Linux 9. CycleCloud 8.8 introduces support for those distributions as well as the latest Ubuntu HPC release. 

Why These Features Matter

The CycleCloud 8.8 release marks a significant leap forward for organizations running HPC and AI workloads in Azure. The improved health check support—anchored by HealthAgent and automated remediation—means less downtime, faster troubleshooting, and greater confidence in cloud-based research and innovation.

Whether you’re managing scientific simulations, AI model training, or enterprise analytics, CycleCloud’s latest features help you build resilient, scalable, and future-ready HPC environments.

Key Features in CycleCloud Workspace for Slurm 1.2

Along with the release of CycleCloud 8.8 comes a new CycleCloud Workspace for Slurm (CCWS) release. This release includes the General Availability of features that were previously in preview, such as Open OnDemand, Cendio ThinLinc, and managed Grafana monitoring capabilities. 

In addition to previously announced features, CCWS 1.2 also includes support for a new Hub and Spoke deployment model. This allows customers to retain a central hub of shared resources that can be re-used between cluster deployments with "disposable" spoke clusters that branch from the hub. Hub and Spoke deployments enable customers who need to re-deploy clusters in order to upgrade their operating system, deploy new versions of software, or even reconfigure the overall architecture of Slurm clusters.

 

Come visit us at SC25 and MS Ignite

To learn more about these features, come visit us at the Microsoft booth at #SC25 in St. Louis, MO and #Microsoft #Ignite in San Francisco this week!

Updated Nov 17, 2025
Version 1.0
No CommentsBe the first to comment