Recent Blogs
Co-Author:
Ansys/Synopsys
Roman Walsh
Madeleine Driver
Jun 11, 202634Views
0likes
0Comments
10 MIN READ
The Paradigm Shift in Model Training
The conventional wisdom in deep learning has been simple: bigger models require bigger infrastructure. Training a 100-billion parameter language model tradition...
May 28, 2026159Views
1like
0Comments
8 MIN READ
Every team that operates GPU clusters for AI has seen this pattern. The cluster boots, GPUs are visible, and scheduling works at a basic level. Then the first distributed training run stalls in NCCL ...
May 22, 2026192Views
1like
0Comments
Standing up an N-node training or inference job and waiting forever for the model checkpoint to land on every node's NVMe? Here's a small Rust + MPI tool — azcp-cluster — that pays Azure egress once,...
May 06, 2026187Views
0likes
0Comments
By Valerie Cutts and Jithin Jose
Last fall we introduced Fairwater, the world’s most powerful AI datacenter. Delivering a system of this scale required rethinking how Azure designs supercompute...
May 06, 20264.5KViews
4likes
1Comment
3 MIN READ
Training large AI models on hundreds or thousands of nodes introduces a critical operational challenge: when a distributed job fails, quickly identifying the root cause across scattered logs can beco...
Mar 31, 2026175Views
4likes
0Comments
NCv6 Virtual Machines are Azure's flexible, next generation platform enabling both leading-edge graphics and generative AI compute workloads. Featuring NVIDIA RTX PRO 6000 Blackwell Server Edition ...
Mar 18, 20261.5KViews
3likes
0Comments
11 MIN READ
If you had to point out the top trends of IT these days, two strong candidates would be Generative AI and Cybersecurity. Especially around the latter, sophistication, reach and volume of cyberattacks...
Mar 09, 2026618Views
1like
0Comments
5 MIN READ
Microsoft returns to NVIDIA GTC 2026 in San Jose with a strong presence across conference sessions, in‑booth theater talks, live demos, and executive‑level ancillary events. Together with NVIDIA an...
Feb 27, 202621KViews
1like
0Comments
3 MIN READ
As AI models continue to scale in size and complexity, cloud infrastructure must deliver more than theoretical peak performance. What matters in practice is reliable, end-to-end, workload-level AI pe...
Feb 18, 2026431Views
0likes
0Comments
Tags
- hpc256 Topics
- ai infrastructure113 Topics
- virtual machines78 Topics
- benchmarking59 Topics
- storage22 Topics
- updates20 Topics
- events19 Topics
- ramp up with me13 Topics
- msignite2 Topics
- Microsoft Ignite 20231 Topic