Recent Blogs
Standing up an N-node training or inference job and waiting forever for the model checkpoint to land on every node's NVMe? Here's a small Rust + MPI tool — azcp-cluster — that pays Azure egress once,...
May 06, 202647Views
0likes
0Comments
By Valerie Cutts and Jithin Jose
Last fall we introduced Fairwater, the world’s most powerful AI datacenter. Delivering a system of this scale required rethinking how Azure designs supercompute...
May 06, 2026647Views
1like
0Comments
3 MIN READ
Training large AI models on hundreds or thousands of nodes introduces a critical operational challenge: when a distributed job fails, quickly identifying the root cause across scattered logs can beco...
Mar 31, 2026157Views
4likes
0Comments
NCv6 Virtual Machines are Azure's flexible, next generation platform enabling both leading-edge graphics and generative AI compute workloads. Featuring NVIDIA RTX PRO 6000 Blackwell Server Edition ...
Mar 18, 20261KViews
2likes
0Comments
11 MIN READ
If you had to point out the top trends of IT these days, two strong candidates would be Generative AI and Cybersecurity. Especially around the latter, sophistication, reach and volume of cyberattacks...
Mar 09, 2026555Views
1like
0Comments
5 MIN READ
Microsoft returns to NVIDIA GTC 2026 in San Jose with a strong presence across conference sessions, in‑booth theater talks, live demos, and executive‑level ancillary events. Together with NVIDIA an...
Feb 27, 202621KViews
1like
0Comments
3 MIN READ
As AI models continue to scale in size and complexity, cloud infrastructure must deliver more than theoretical peak performance. What matters in practice is reliable, end-to-end, workload-level AI pe...
Feb 18, 2026365Views
0likes
0Comments
Imagine having several clusters across different environments (dev, test and prod) or planning a migration between PBS and Slurm or porting codes to a different system. They can all seem like dauntin...
Feb 06, 2026325Views
0likes
0Comments
8 MIN READ
Automotive Design and the DrivAerNet++ Benchmark
In automotive design, external aerodynamics have a direct impact on performance, energy efficiency, and development cost. Even small reductions in d...
Jan 12, 2026613Views
0likes
0Comments
When running containerized workloads on HPC clusters, one of the first problems you hit is getting container images onto the nodes quickly and repeatably. A .sqsh is a Squashfs image (commonly used b...
Jan 09, 2026406Views
1like
0Comments
Tags
- hpc254 Topics
- ai infrastructure109 Topics
- virtual machines77 Topics
- benchmarking57 Topics
- storage22 Topics
- updates20 Topics
- events19 Topics
- ramp up with me13 Topics
- msignite2 Topics
- Microsoft Ignite 20231 Topic