User Profile

CormacGarvey

Microsoft

Joined 7 years ago

19 Posts32 Likes

View All Badges

User Widgets

Recent Discussions

No content to show

Recent Blog Articles

Performance analysis of DeepSeek R1 AI Inference using vLLM on ND-H100-v5
Introduction The DeepSeek R1 model represents a new frontier in large-scale reasoning for AI applications. Designed to tackle complex inference tasks, R1 pushes the boundaries of what’s possible—bu...
Aug 28, 2025 Place Azure High Performance Computing (HPC) Blog
547Views
0likes
0Comments
Inference performance of Llama 3.1 8B using vLLM across various GPUs and CPUs
Introduction Following our previous evaluation of Llama 3.1 8B inference performance on Azure’s ND-H100-v5 infrastructure using vLLM, this report broadens the scope to compare inference performance...
Aug 26, 2025 Place Azure High Performance Computing (HPC) Blog
858Views
0likes
0Comments
Performance of Llama 3.1 8B AI Inference using vLLM on ND-H100-v5
Introduction The pace of development in large language models (LLMs) has continued to accelerate as the global AI community races toward the goal of artificial general intelligence (AGI). Today’s m...
Aug 26, 2025 Place Azure High Performance Computing (HPC) Blog
609Views
1like
0Comments
GPU node health checks integrated into Azure Kubernetes service via node problem detector
The Azurehpc node health repository provides a suite of recommended node health checks for all Azure specialized SKU’s (including GPU’s). In this blog post we will show how to integrate the GPU node ...
Jun 27, 2024 Place Azure High Performance Computing (HPC) Blog
3.8KViews
1like
0Comments
HPC/AI Storage options for NDm_v4 (A100) Azure kubernetes service (AKS) cluster
We will show how to set-up and use popular azure HPC/AI storage options (such as local NVMe SSDs, Azure managed lustre Filesystem (AMLFS) and Azure files+NFSv4) in an NDm_v4 AKS cluster and provide I...
Jun 15, 2023 Place Azure High Performance Computing (HPC) Blog
4.2KViews
0likes
0Comments
Deploy NDm_v4 (A100) Kubernetes Cluster
We show how to deploy an optimal NDm_v4 (A100) AKS cluster, making sure that all 8 GPU and 8 InfiniBand devices available on each vritual machine come up correctly and are available to deliver optima...
Jun 03, 2023 Place Azure High Performance Computing (HPC) Blog
9.7KViews
5likes
2Comments
E2E deployment of a production ready NDv4 (A100) cluster targeting large deep learning training
The NDv4 series is very popular for running large deep learning training jobs, which require lots of floating-point performance and high interconnection bandwidth. In this article we will walk throug...
Jul 22, 2022 Place Azure High Performance Computing (HPC) Blog
3.3KViews
2likes
0Comments
Automated HPC/AI compute node health-checks Integrated with the SLURM scheduler
It is best practice to run health-checks on compute nodes before running jobs, this is especially important for tightly coupled HPC/AI applications. The virtual machines that fail the health-checks s...
Feb 04, 2022 Place Azure High Performance Computing (HPC) Blog
6.1KViews
1like
0Comments
HPC/AI Cluster resource utilization monitoring using Azure Monitor
Monitoring is a crucial aspect of managing a high-performance computing (HPC) or AI cluster. Here we will focus specifically on resource utilization monitoring using a Custom data collector and the A...
Feb 02, 2022 Place Azure High Performance Computing (HPC) Blog
10KViews
2likes
2Comments
Performance considerations for large scale deep learning training on Azure NDv4 (A100) series
Modern DL training jobs require large Clusters of multi-GPUs with high floating-point performance connected with high bandwidth, low latency networks. The Azure NDv4 VM series is designed specificall...
Aug 28, 2021 Place Azure High Performance Computing (HPC) Blog
19KViews
4likes
0Comments