azure kubernetes service
185 TopicsAzure Monitor managed service for Prometheus now includes native Grafana dashboards
We are excited to announce that Azure Monitor managed service for Prometheus now includes native Grafana dashboards within the Azure portal at no additional cost. This integration marks a major milestone in our mission to simplify observability reducing the administrative overhead and complexity compared to deploying and maintaining your own Grafana instances. The use of open-source observability tools continues to grow for cloud-native scenarios such as application and infrastructure monitoring using Prometheus metrics and OpenTelemetry logs and traces. For these scenarios, DevOps and SRE teams need streamlined and cost-effective access to industry-standard tooling like Prometheus metrics and Grafana dashboards within their cloud-hosted environments. For many teams, this usually means deploying and managing separate monitoring stacks with some versions self-hosted or partner-managed Prometheus and Grafana. However, Azure Monitor's latest integrations with Grafana provides this capability out-of-the-box by enabling you to view Prometheus metrics and Azure other observability data in Grafana dashboards fully integrated into the Azure portal. Azure Monitor dashboards with Grafana delivers powerful visualization and data transformation capabilities on Prometheus metrics, Azure resource metrics, logs, and traces stored in Azure Monitor. Pre-built dashboards are included for several key scenarios like Azure Kubernetes Service, Azure Container Apps, Container Insights, and Application Insights. Why Grafana in Azure portal? Grafana dashboards are widely adopted visualization tool used with Prometheus metrics and cloud-native observability tools. Embedding it natively in Azure Portal offers: Unified Azure experience: No additional RBAC or network configuration required, users Azure login credentials and Azure RBAC are used to access dashboards and data. View Grafana dashboards alongside all your other Azure resources and Azure Monitor views in the same portal. No management overhead or compute costs: Dashboards with Grafana use a fully SaaS model built into Azure Monitor, where you do not have to administer the Grafana server or the compute on which it runs. Access to community dashboards: Open-source and Grafana community dashboards using Prometheus or Azure Monitor data sources can be imported with no modifications. These capabilities mean faster troubleshooting, deeper insights, and a more consistent observability platform for Azure-centric workloads. Figure 1: Dashboards with Grafana landing page in the context of Azure Monitor Workspace in the Azure portal Getting Started To get started, enable Managed Prometheus for your AKS cluster and then navigate to the Azure Monitor workspace or AKS cluster in the Azure portal and select Monitoring > Dashboards with Grafana (preview). From this page you can view, edit, create and import Grafana dashboards. Simply click on one of the pre-built dashboards to get started. You may use these dashboards as they have been provided or edit and add panels, update visualizations and create variables to create your own custom dashboards. With this approach, no Grafana servers or additional Azure resources need to be provisioned or maintained. Teams can quickly leverage and customize Grafana dashboards within the Azure portal, reducing their deployment and management time while still gaining the benefits of dashboards and visualizations to improve monitoring and troubleshooting times. Figure 2: Kubernetes Compute Resources dashboard being viewed in the context of Azure Monitor Workspace in the Azure portal When to upgrade to Azure Managed Grafana? Dashboards with Grafana in the Azure portal cover most common Prometheus scenarios but, Azure Managed Grafana remains the right choice for several advanced use cases, including: Extended data source support for non-Azure data sources e.g. open-source and third-party data stores Private networking and advanced authentication options Multi-cloud, hybrid and on-premises data source connectivity. See When to use Azure Managed Grafana for more details. Get started with Azure Monitor dashboards with Grafana today.189Views1like0CommentsGenerally Available - High scale mode in Azure Monitor - Container Insights
Container Insights is Azure Monitor’s solution for collecting logs from your Azure Kubernetes Service (AKS) clusters. As the adoption of AKS continues to grow, we are seeing an increasing number of customers with log scaling needs that hit the limits of log collection in Container Insights. Last August, we announced the public preview of High Scale mode in Container Insights to help customers achieve a higher log collection throughput from their AKS clusters. Today, we are happy to announce the General Availability of High Scale mode. High scale mode is ideal for customers approaching or above 10,000 logs/sec from a single node. When High Scale mode is enabled, Container Insights does multiple configuration changes leading to a higher overall throughput. These include using a more powerful agent setup, using a different data pipeline, allocating more memory for the agent, and more. All these changes are made in the background by the service and do not require input or configuration from customers. High Scale mode impacts only the data collection layer (with a new DCR) – the rest of the experience remains the same. Data flows to our existing tables, your queries and alerts work as before too. High Scale mode is available to all customers. Today, High scale is turned off by default. In the future, we plan to enable High Scale mode by default for all customers to reduce the chances of log loss when workloads scale. To get started with High Scale mode, please see our documentation at https://aka.ms/cihsmode160Views1like0CommentsSecuring Cloud Shell Access to AKS
Azure Cloud Shell is an online shell hosted by Microsoft that provides instant access to a command-line interface, enabling users to manage Azure resources without needing local installations. Cloud Shell comes equipped with popular tools and programming languages, including Azure CLI, PowerShell, and the Kubernetes command-line tool (kubectl). Using Cloud Shell can provide several benefits for administrators who need to work with AKS, especially if they need quick access from anywhere, or are in locked down environments: Immediate Access: There’s no need for local setup; you can start managing Azure resources directly from your web browser. Persistent Storage: Cloud Shell offers a file share in Azure, keeping your scripts and files accessible across multiple sessions. Pre-Configured Environment: It includes built-in tools, saving time on installation and configuration. The Challenge of Connecting to AKS By default, Cloud Shell traffic to AKS originates from a random Microsoft-managed IP address, rather than from within your network. As a result, the AKS API server must be publicly accessible with no IP restrictions, which poses a security risk as anyone on the internet can attempt to reach it. While credentials are still required, restricting access to the API server significantly enhances security. Fortunately, there are ways to lock down the API server while still enabling access via Cloud Shell, which we’ll explore in the rest of this article Options for Securing Cloud Shell Access to AKS Several approaches can be taken to secure the access to your AKS cluster while using Cloud Shell: IP Allow Listing On AKS clusters with a public API server, it is possible to lock down access to the API server with an IP allow list. Each Cloud Shell instance has a randomly selected outbound IP coming from the Azure address space whenever a new session is deployed. This means we cannot allow access to these IPs in advance, but we apply them once our session is running and this will work for the duration of our session. Below is an example script that you could run from Cloud Shell to check the current outbound IP address and allow it on your AKS clusters authorised IP list. #!/usr/bin/env bash set -euo pipefail RG="$1"; AKS="$2" IP="$(curl -fsS https://api.ipify.org)" echo "Adding ${IP} to allow list" CUR="$(az aks show -g "$RG" -n "$AKS" --query "apiServerAccessProfile.authorizedIpRanges" -o tsv | tr '\t' '\n' | awk 'NF')" NEW="$(printf "%s\n%s/32\n" "$CUR" "$IP" | sort -u | paste -sd, -)" if az aks update -g "$RG" -n "$AKS" --api-server-authorized-ip-ranges "$NEW" >/dev/null; then echo "IP ${IP} applied successfully"; else echo "Failed to apply IP ${IP}" >&2; exit 1; fi This method comes with some caveats: The users running the script would need to be granted permissions to update the authorised IP ranges in AKS - this permission could be used to add any IP address This script will need to be run each time a Cloud Shell session is created, and can take a few minutes to run The script only deals with adding IPs to the allow list, you would also need to implement a process to remove these IPs on a regular basis to avoid building up a long list of IPs that are no longer needed. Adding Cloud Shell IPs in bulk, through Service Tags or similar will result in your API server being accessible to a much larger range of IP addresses, and should be avoided. Command Invoke Azure provides a feature known as Command Invoke that allows you to send commands to be run in AKS, without the need for direct network connectivity. This method executes a container within AKS to run your command and then return the result, and works well from within Cloud Shell. This is probably the simplest approach that works with a locked down API server and the quickest to implement. However, there are some downsides: Commands take longer to run - when you execute the command, it needs to run a container in AKS, execute the command and then return the result. You only get exitCode and text output, and you lose API level details. All commands must be run within the context of the az aks command invoke CLI command, making commands much longer and complex to execute, rather than direct access with Kubectl Command Invoke can be a practical solution for occasional access to AKS, especially when the cost or complexity of alternative methods isn't justified. However, its user experience may fall short if relied upon as a daily tool. Further Details: Access a private Azure Kubernetes Service (AKS) cluster using the command invoke or Run command feature - Azure Kubernetes Service | Microsoft Learn Cloud Shell vNet Integration It is possible to deploy Cloud Shell into a virtual network (vNet), allowing it to route traffic via the vNet, and so access resources using private network, Private Endpoints, or even public resources, but using a NAT Gateway or Firewall for consistent outbound IP address. This approach uses Azure Relay to provide secure access to the vNet from Cloud Shell, without the need to open additional ports. When using Cloud Shell in this way, it does introduce additional cost for the Azure Relay service. Using this solution will require two different approaches, depending on whether you are using a private or public API server. When using a Private API server, which is either directly connected to the vNet, or configured with Private Endpoints, Cloud Shell will be able to connect directly to the private IP of this service over the vNet When using a Public API server, with a public IP, traffic for this will still leave the vNet and go to the internet. The benefit is that we can control the public IP used for the outbound traffic using a Nat Gateway or Azure Firewall. Once this is configured, we can then allow-list this fixed IP on the AKS API server authorised IP ranges. Further Details: Use Cloud Shell in an Azure virtual network | Microsoft Learn Azure Bastion Azure Bastion provides secure and seamless RDP and SSH connectivity to your virtual machines (VMs) directly from the Azure portal, without exposing them to the public internet. Recently, Bastion has also added support for direct connection to AKS with SSH, rather than needing to connect to a jump box and then use Kubectl from there. This greatly simplifies connecting to AKS, and also reduces the cost. Using this approach, we can deploy a Bastion into the vNet hosting AKS. From Cloud Shell we can then use the following command to create a tunnel to AKS. az aks bastion --name <aks name> --resource-group <resource group name> --bastion <bastion resource ID> Once this tunnel is connected, we can run Kubectl commands without any need for further configuration. As with Cloud Shell network integration, we take two slightly different approaches depending on whether the API server is public or private: When using a Private API server, which is either directly connected to the vNet, or configured with Private Endpoints, Cloud Shells connected via Bastion will be able to connect directly to the private IP of this service over the vNet When using a Public API server, with a public IP, traffic for this will still leave the vNet and go to the internet. As with Cloud Shell vNet integration, we can configure this to use a static outbound IP and allow list this on the API server. Using Bastion, we can still use NAT Gateway or Azure Firewall to achieve this, however you can also allow list the public IP assigned to the Bastion, removing the cost for NAT Gateway or Azure Firewall if these are not required for anything else. Connecting to AKS directly from Bastion requires the use of the Standard for Premium SKU of Bastion, which does have additional cost over the Developer or Basic SKU. This feature also requires that you enable native client support. Further details: Connect to AKS Private Cluster Using Azure Bastion (Preview) - Azure Bastion | Microsoft Learn Summary of Options IP Allow Listing The outbound IP addresses for Cloud Shell instances can be added to the Authorised IP list for your API server. As these IPs are dynamically assigned to sessions they would need to be added at runtime, to avoid adding a large list of IPs and reducing security. This can be achieved with a script. While easy to implement, this requires additional time to run the script with every new session, and increases the overhead for managing the Authorise IP list to remove unused IPs. Command Invoke Command Invoke allows you to run commands against AKS without requiring direct network access or any setup. This is a convenient option for occasional tasks or troubleshooting, but it’s not designed for regular use due to its limited user experience and flexibility. Cloud Shell vNet Integration This approach connects Cloud Shell directly to your virtual network, enabling secure access to AKS resources. It’s well-suited for environments where Cloud Shell is the primary access method and offers a more secure and consistent experience than default configurations. It does involve additional cost for Azure Relay. Azure Bastion Azure Bastion provides a secure tunnel to AKS that can be used from Cloud Shell or by users running the CLI locally. It offers strong security by eliminating public exposure of the API server and supports flexible access for different user scenarios, though it does require setup and may incur additional cost. Cloud Shell is a great tool for providing pre-configured, easily accessible CLI instances, but in the default configuration it can require some security compromises. With a little work, it is possible to make Cloud Shell work with a more secure configuration that limits how much exposure is needed for your AKS API server.211Views1like0CommentsDeploying Azure ND H100 v5 Instances in AKS with NVIDIA MIG GPU Slicing
In this article we will cover: AKS Cluster Deployment (Latest Version) – creating an AKS cluster using the latest Kubernetes version. GPU Node Pool Provisioning – adding an ND H100 v5 node pool on Ubuntu, with --skip-gpu-driver-install to disable automatic driver installation. NVIDIA H100 MIG Slicing Configurations – available MIG partition profiles on the H100 GPU and how to enable them. Workload Recommendations for MIG Profiles – choosing optimal MIG slice sizes for different AI/ML and HPC scenarios. Best Practices for MIG Management and Scheduling – managing MIG in AKS, scheduling pods, and operational tips. AKS Cluster Deployment (Using the Latest Version) Install/Update Azure CLI: Ensure you have Azure CLI 2.0.64+ (or Azure CLI 1.0.0b2 for preview features). This is required for using the --skip-gpu-driver-install option and other latest features. Install the AKS preview extension if needed: az extension add --name aks-preview az extension update --name aks-preview (Preview features are opt-in; using the preview extension gives access to the latest AKS capabilities) Create a Resource Group: If not already done, create an Azure resource group for the cluster: az group create -n MyResourceGroup -l eastus Create the AKS Cluster: Run az aks create to create the AKS control plane. You can start with a default system node pool (e.g. a small VM for system pods) and no GPU nodes yet. For example: az aks create -g MyResourceGroup -n MyAKSCluster \ --node-vm-size Standard_D4s_v5 \ --node-count 1 \ --kubernetes-version <latest-stable-version> \ --enable-addons monitoring This creates a cluster named MyAKSCluster with one standard node. Use the --kubernetes-version flag to specify the latest AKS-supported Kubernetes version (or omit it to get the default latest). As of early 2025, AKS supports Kubernetes 1.27+; using the newest version ensures support for features like MIG and the ND H100 v5 SKU. Retrieve Cluster Credentials: Once created, get your Kubernetes credentials: az aks get-credentials -g MyResourceGroup -n MyAKSCluster Verification: After creation, you should have a running AKS cluster. You can verify the control plane is up with: kubectl get nodes Adding an ND H100 v5 GPU Node Pool (Ubuntu + Skip Driver Install) Next, add a GPU node pool using the ND H100 v5 VM size. The ND H100 v5 series VMs each come with 8× NVIDIA H100 80GB GPUs (640 GB total GPU memory), high-bandwidth interconnects, and 96 vCPUs– ideal for large-scale AI and HPC workloads. We will configure this node pool to run Ubuntu and skip the automatic NVIDIA driver installation, since we plan to manage drivers (and MIG settings) manually or via the NVIDIA operator. Steps to add the GPU node pool: Use Ubuntu Node Image: AKS supports Ubuntu 20.04/22.04 for ND H100 v5 nodes. The default AKS Linux OS (Ubuntu) is suitable. We also set --os-sku Ubuntu to ensure we use Ubuntu (if your cluster’s default is Azure Linux, note that Azure Linux is not currently supported for MIG node pools). Add the GPU Node Pool with Azure CLI: Run: az aks nodepool add \ --cluster-name MyAKSCluster \ --resource-group MyResourceGroup \ --name h100np \ --node-vm-size Standard_ND96isr_H100_v5 \ --node-count 1 \ --node-os-type Linux \ --os-sku Ubuntu \ --gpu-driver none \ --node-taints nvidia.com/gpu=true:NoSchedule Let’s break down these parameters: --node-vm-size Standard_ND96isr_H100_v5 selects the ND H100 v5 VM size (96 vCPUs, 8×H100 GPUs). Ensure your subscription has quota for this SKU and region. --node-count 1 starts with one GPU VM (scale as needed). --gpu-driver none tells AKS not to pre-install NVIDIA drivers on the node. This prevents the default driver installation, because we plan to handle drivers ourselves (using NVIDIA’s GPU Operator for better control). When using this flag, new GPU nodes come up without NVIDIA drivers until you install them manually or via an operator--node-taints --node-taints nvidia.com/gpu=true:NoSchedule taints the GPU nodes so that regular pods won’t be scheduled on them accidentally. Only pods with a matching toleration (e.g. labeled for GPU use) can run on these nodes. This is a best practice to reserve expensive GPU nodes for GPU workloads (Optional) You can also add labels if needed. For example, to prepare for MIG configuration with the NVIDIA operator, you might add a label like nvidia.com/mig.config=all-1g.10gb to indicate the desired MIG slicing (explained later). We will address MIG config shortly, so adding such a label now is optional. Wait for Node Pool to be Ready: Monitor the Azure CLI output or use kubectl get nodes until the new node appears. It should register in Kubernetes (in NotReady state initially while it's configuring). Since we skipped driver install, the node will not have GPU scheduling resources yet (no nvidia.com/gpu resource visible) until we complete the next step. Installing the NVIDIA Driver Manually (or via GPU Operator) Because we used --skip-gpu-driver-install, the node will not have the necessary NVIDIA driver or CUDA runtime out of the box. You have two main approaches to install the driver: Use the NVIDIA GPU Operator (Helm-based) to handle driver installation. Install drivers manually (e.g., run a DaemonSet that downloads and installs the .run package or Debian packages). NVIDIA GPU Operator manages drivers, the Kubernetes device plugin, and GPU monitoring components. AKS GPU node pools come with the NVIDIA drivers and container runtime already pre-installed. BUT, because we used the flag : -skip-gpu-driver-install, we can now deploy the NVIDIA GPU Operator to handle GPU workloads and monitoring, while disabling its driver installation (to avoid conflicts with the pre-installed drivers). The GPU Operator will deploy the necessary components like the Kubernetes device plugin and the DCGM exporter for monitoring. 2.1 Installing via NVIDIA GPU Operator Step 1: Add the NVIDIA Helm repository. NVIDIA provides a Helm chart for the GPU Operator. Add the official NVIDIA Helm repo and update it: helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update This repository contains the gpu-operator chart and other NVIDIA helm charts Step 2: Install the GPU Operator via Helm. Use Helm to install the GPU Operator into a dedicated namespace (e.g., gpu-operator). In AKS, disable the GPU Operator’s driver and toolkit deployment (since AKS already has those), and specify the correct container runtime class for NVIDIA. For example: helm install gpu-operator nvidia/gpu-operator \ -n gpu-operator --create-namespace \ --set operator.runtimeClass=nvidia-container-runtime In the above command: operator.runtimeClass=nvidia-container-runtime aligns with the runtime class name configured on AKS for GPU support After a few minutes, Helm should report a successful deployment. For example: NAME: gpu-operator LAST DEPLOYED: Fri May 5 15:30:05 2023 NAMESPACE: gpu-operator STATUS: deployed REVISION: 1 TEST SUITE: None You can verify that the GPU Operator’s pods are running in the cluster. The Operator will deploy several DaemonSets including the NVIDIA device plugin, DCGM exporter, and others. For example, after installation you should see pods like the following in the gpu-operator namespace: nvidia-dcgm-exporter-xxxxx 1/1 Running 0 60s nvidia-device-plugin-daemonset-xxxxx 1/1 Running 0 60s nvidia-mig-manager-xxxxx 1/1 Running 0 4m nvidia-driver-daemonset-xxxxx 1/1 Running 0 4m gpu-operator-node-feature-discovery-... 1/1 Running 0 5m ... (other GPU operator pods) ... Here we see the NVIDIA device plugin and NVIDIA DCGM exporter pods running on each GPU node, as well as other components. (Note: In our AKS setup, the nvidia-driver-daemonset may be present but left idle since we disabled driver management.) Step 3: Confirm the operator’s GPU validation. The GPU Operator will run a CUDA validation job to verify everything is working. Check that the CUDA validation pod has completed successfully: kubectl get pods -n gpu-operator -l app=nvidia-cuda-validator Expected output: NAME READY STATUS RESTARTS AGE nvidia-cuda-validator-bpvkt 0/1 Completed 0 3m56s A Completed CUDA validator indicates the GPUs are accessible and the NVIDIA stack is functioning. At this point, you have the NVIDIA GPU Operator (with device plugin and DCGM exporter) installed via Helm on AKS. Verifying MIG on H100 with Node Pool Provisioning Once the driver is installed and the NVIDIA device plugin is running, you can verify MIG. The process is similar to verifying MIG on A100, but the resource naming and GPU partitioning reflect H100 capabilities. Check Node Resources kubectl describe node <h100-node-name> If you chose single MIG strategy, you might see: Allocatable: nvidia.com/gpu: 56 for a node with 8 H100s × 7 MIG slices each = 56. Or: nvidia.com/gpu: 14 if you used MIG2g (which yields 2–3 slices per GPU, depending on the exact profile). If you chose mixed MIG strategy (mig.strategy=mixed), you’ll see something like: Allocatable: nvidia.com/mig-1g.10gb: 56 or whichever MIG slice name is appropriate (e.g., mig-3g.40gb for MIG3g). Confirm MIG in nvidia-smi Run a GPU Workload For instance, run a quick CUDA container: kubectl run mig-test --rm -ti \ --image=nvidia/cuda:12.1.1-runtime-ubuntu22.04 \ --limits="nvidia.com/gpu=1" \ -- bash Inside the container, nvidia-smi should confirm you have a MIG device. Then any CUDA commands (e.g., deviceQuery) should pass, indicating MIG is active and the driver is working. nvidia-smi -L MIG Management on H100 The H100 supports several MIG profiles – predefined ways to slice the GPU. Each profile is denoted by <N>g.<M>gb meaning it uses N GPU compute slices (out of 7) and M GB of memory. Key H100 80GB MIG profiles include: MIG 1g.10gb: Each instance has 1/7 of the SMs and 10 GB memory (1/8 of VRAM). This yields 7 instances per GPU (7 × 10 GB = 70 GB out of 80, a small portion is reserved). This is the smallest slice size and maximizes the number of instances (useful for many lightweight tasks). MIG 1g.20gb: Each instance has 1/7 of SMs but 20 GB memory (1/4 of VRAM), allowing up to 4 instances per GPU. This profile gives each instance more memory while still only a single compute slice – useful for memory-intensive workloads that don’t need much compute. MIG 2g.20gb: Each instance gets 2/7 of SMs and 20 GB memory (2/8 of VRAM). 3 instances can run on one GPU. This offers a balance: more compute per instance than 1g, with a moderate 20 GB memory each. MIG 3g.40gb: Each instance has 3/7 of SMs and 40 GB memory (half the VRAM). Two instances fit on one H100. This effectively splits the GPU in half. MIG 4g.40gb: Each instance uses 4/7 of SMs and 40 GB memory. Only one such instance can exist per GPU (because it uses half the memory and more than half of the SMs). In practice, a 4g.40gb profile might be combined with a smaller profile on the same GPU (e.g., a 4g.40gb + a 3g.40gb could occupy one GPU, totaling 7/7 SM and 80GB). However, AKS node pools use a single uniform profile per GPU, so you typically wouldn’t mix profiles on the same GPU in AKS. MIG 7g.80gb: This profile uses the entire GPU (all 7/7 SMs and 80 GB memory). Essentially, MIG 7g.80gb is the full GPU as one instance (no slicing). It’s equivalent to not using MIG at all for that GPU. These profiles illustrate the flexibility: you can trade off number of instances vs. the power of each instance. For example, MIG 1g.10gb gives you seven small GPUs, whereas MIG 3g.40gb gives you two much larger slices (each roughly half of an H100). All MIG instances are hardware-isolated, meaning each instance’s performance is independent (one instance can’t starve others of GPU resources) Enabling MIG in AKS: There are two main ways to configure MIG on the AKS node pool: At Node Pool Creation (Static MIG Profile): Azure allows specifying a GPU instance profile when creating the node pool. For example, adding --gpu-instance-profile MIG1g to the az aks nodepool add command would provision each H100 GPU in 1g mode (e.g., 7×10GB instances per GPU). Supported profile names for H100 include MIG1g, MIG2g, MIG3g, MIG4g, and MIG7g (the same profile names used for A100, but on H100 they correspond to the sizes above). Important: Once set, the MIG profile on a node pool cannot be changed without recreating the node pool. If you chose MIG1g, all GPUs in that node pool will be partitioned into 7 slices each, and you can’t later switch those nodes to a different profile on the fly. Dynamically via NVIDIA GPU Operator: If you skipped the driver install (as we did) and are using the GPU Operator, you can let the operator manage MIG. This involves labeling the node with a desired MIG layout. For example, nvidia.com/mig.config=all-1g.10gb means “partition all GPUs into 1g.10gb slices.” The operator’s MIG Manager will then enable MIG mode on the GPUs, create the specified MIG instances, and mark the node ready when done. This approach offers flexibility – you could theoretically adjust the MIG profile by changing the label and letting the operator reconfigure (though it will drain and reboot the node to apply changes). The operator adds a taint like mig-nvidia.io/device-config=pending (or similar) during reconfiguration to prevent scheduling pods too early For our deployment, we opted to skip Azure’s automatic MIG config and use the NVIDIA operator. If you followed the steps in section 2 and set the nvidia.com/mig.config label before node creation, the node on first boot will come up, install drivers, then partition into the specified MIG profile. If not, you can label the node now and the operator will configure MIG accordingly. For example: kubectl label node <node-name> nvidia.com/mig.config=all-3g.40gb --overwrite to split each GPU into two 3g.40gb instances. The operator will detect this and partition the GPUs (the node may briefly go NotReady while MIG is being set up). After MIG is configured, verify the node’s GPU resources again. Depending on the MIG strategy (see next section), you will either see a larger number of generic nvidia.com/gpu resources or specifically named resources like nvidia.com/mig-3g.40gb. We will discuss how to schedule workloads onto these MIG instances next. Important Considerations: Workload Interruption: Applying a new MIG configuration can disrupt running GPU workloads. It's advisable to drain the node or ensure that no critical workloads are running during the reconfiguration process. Node Reboot: Depending on the environment and GPU model, enabling or modifying MIG configurations might require a node reboot. Ensure that your system is prepared for potential reboots to prevent unexpected downtime. Workload Recommendations for MIG Profiles (AI/ML vs. HPC) Different MIG slicing configurations are suited to different types of workloads. Here are recommendations for AI/ML and HPC scenarios: Full GPU (MIG 7g.80gb or MIG disabled) – Best for the largest and most intensive tasks. If you are training large deep learning models (e.g. GPT-style models, complex computer vision training) or running HPC simulations that fully utilize a GPU, you should use the entire H100 GPU. The ND H100 v5 is designed to excel at these demanding workloads. In Kubernetes, you would simply schedule pods that request a whole GPU. (If MIG mode is enabled with 7g.80gb profile, each GPU is one resource unit.) This ensures maximum performance for jobs that can utilize 80 GB of GPU memory and all compute units. HPC workloads like physics simulations, CFD, weather modeling, etc., typically fall here – they are optimized to use full GPUs or even multiple GPUs in parallel, so slicing a GPU could impede their performance unless you explicitly want to run multiple smaller HPC jobs on one card. Large MIG Partitions (3g.40gb or 4g.40gb) – Good for moderately large models or jobs that don’t quite need a full H100. For instance, you can split an H100 into 2× 3g.40gb instances, each with 40 GB VRAM and ~43% of the H100’s compute. This configuration is popular for AI model serving and inference where a full H100 might be underutilized. In fact, it might happen that two MIG 3g.40gb instances on an H100 can serve models with performance equal or better than two full A100 GPUs, at a lower cost. Each 3g.40gb slice is roughly equivalent to an A100 40GB in capability, and also unlocks H100-specific features (like FP8 precision for inference). Use cases: Serving two large ML models concurrently (each model up to 40GB in size, such as certain GPT-XXL or vision models). Each model gets a dedicated MIG slice. Running two medium-sized training jobs on one physical GPU. For example, two separate experiments that each need ~40GB GPU memory can run in parallel, each on a MIG 3g.40gb. This can increase throughput for hyperparameter tuning or multi-user environments. HPC batch jobs: if you have HPC tasks that can fit in half a GPU (perhaps memory-bound tasks or jobs that only need ~50% of the GPU’s FLOPs), using two 3g.40gb instances allows two jobs to run on one GPU server concurrently with minimal interference. MIG 4g.40gb (one 40GB instance using ~57% of compute) is a less common choice by itself – since only one 4g instance can exist per GPU, it leaves some GPU capacity unused (the remaining 3/7 SMs would be idle). It might be used in a mixed profile scenario (one 4g + one 3g on the same GPU) if manually configured. In AKS (which uses uniform profiles per node pool), you’d typically prefer 3g.40gb if you want two equal halves, or just use full GPUs. So in practice, stick with 3g.40gb for a clean 2-way split on H100. Medium MIG Partitions (2g.20gb) – Good for multiple medium workloads. This profile yields 3 instances per GPU, each with 20 GB memory and about 28.6% of the compute. This is useful when you have several smaller training jobs or medium-sized inference tasks that run concurrently. Examples: Serving three different ML models (each ~15–20 GB in size) from one H100 node, each model on its own MIG 2g.20gb instance. Running 3 parallel training jobs for smaller models or prototyping (each job can use 20GB GPU memory). For instance, three data scientists can share one H100 GPU server, each getting what is effectively a “20GB GPU”. Each 2g.20gb MIG slice should outperform a V100 (16 GB) in both memory and compute, so this is still a hefty slice for many models. In HPC context, if you had many lighter GPU-accelerated tasks (for example, three independent tasks that each use ~1/3 of a GPU), this profile could allow them to share a node efficiently. Small MIG Partitions (1g.10gb) – Ideal for high-density inference and lightweight workloads. This profile creates 7 instances per GPU, each with 10 GB VRAM and 1/7 of the compute. It’s perfect for AI inference microservices, model ensembles, or multi-tenant GPU environments: Deploying many small models or instances of a model. For example, you could host seven different AI services (each requiring <10GB GPU memory) on one physical H100, each in its own isolated MIG slice. Most cloud providers use this to offer “fractional GPUs” to customers– e.g., a user could rent a 1g.10gb slice instead of the whole GPU. Running interactive workloads like Jupyter notebooks or development environments for multiple users on one GPU server. Each user can be assigned a MIG 1g.10gb slice for testing small-scale models or doing data science workloads, without affecting others. Inference tasks that are memory-light but require GPU acceleration – e.g., running many inference requests in parallel across MIG slices (each slice still has ample compute for model scoring tasks, and 10 GB is enough for many models like smaller CNNs or transformers). Keep in mind that 1g.10gb slices have the lowest compute per instance, so they are suited for workloads that individually don’t need the full throughput of an H100. They shine when throughput is achieved by running many in parallel. 1g.20gb profile – This one is a bit niche: 4 slices per GPU, each with 20 GB but only 1/7 of the SMs. You might use this if each task needs a large model (20 GB) but isn’t compute-intensive. An example could be running four instances of a large language model in inference mode, where each instance is constrained by memory (loading a 15-18GB model) but you deliberately limit its compute share to run more concurrently. In practice, the 2g.20gb profile (which gives the same memory per instance and more compute) might be preferable if you can utilize the extra SMs. So 1g.20gb would only make sense if you truly have compute-light, memory-heavy workloads or if you need exactly four isolated instances on one GPU. HPC Workloads Consideration: Traditional HPC jobs (MPI applications, scientific computing) typically either use an entire GPU or none. MIG can be useful in HPC for capacity planning – e.g., running multiple smaller GPU-accelerated jobs simultaneously if they don’t all require a full H100. But it introduces complexity, as the HPC scheduler must be aware of fractional GPUs. Many HPC scenarios might instead use whole GPUs per job for simplicity. That said, for HPC inference or analytics (like running multiple inference tasks on simulation output), MIG slicing can improve utilization. If jobs are latency-sensitive, MIG’s isolation ensures one job doesn’t impact another, which is beneficial for multi-tenant HPC clusters (for example, different teams sharing a GPU node). In summary, choose the smallest MIG slice that still meets your workload’s requirements. This maximizes overall GPU utilization and cost-efficiency by packing more tasks on the hardware. Use larger slices or full GPUs only when a job truly needs the extra memory and compute. It’s often a good strategy to create multiple GPU node pools with different MIG profiles tailored to different workload types (e.g., one pool of full GPUs for training and one pool of 1g or 2g MIG GPUs for inference). Appendix A: MIG Management via AKS Node Pool Provisioning (without GPU Operator MIG profiles) Multi-Instance GPU (MIG) allows partitioning an NVIDIA A100 (and newer) GPU into multiple instances. AKS supports MIG for compatible GPU VM sizes (such as the ND A100 v4 series), but MIG must be configured when provisioning the node pool – it cannot be changed on the fly in AKS. In this section, we show how to create a MIG-enabled node pool and integrate it with Kubernetes scheduling. We will not use the GPU Operator’s dynamic MIG reconfiguration; instead, we set MIG at node pool creation time (which is the only option on AKS). Step 1: Provision an AKS node pool with a MIG profile. Choose a MIG-capable VM size (for example, Each instance has 1/7 of the SMs and 10 GB memory (1/8 of VRAM). This yields 7 instances per GPU (7 × 10 GB = 70 GB out of 80, a small portion is reserved). Use the Azure CLI to create a new node pool and specify the --gpu-instance-profile: az aks nodepool add \ --resource-group <myResourceGroup> \ --cluster-name <myAKSCluster> \ --name migpool \ --node-vm-size Standard_ND96isr_H100_v5 \\ --node-count 1 \ --gpu-instance-profile MIG1g In this example, we create a node pool named "migpool" with MIG profile MIG1g (each physical H100 GPU is split into 7 instances of 1g/5gb each). Important: You cannot change the MIG profile after the node pool is created. If you need a different MIG configuration (e.g., 2g.10gb or 4g.20gb instances), you must create a new node pool with the desired profile. Note: MIG is only supported on Ubuntu-based AKS node pools (not on Azure Linux nodes), and currently the AKS cluster autoscaler does not support scaling MIG-enabled node pools. Plan capacity accordingly since MIG node pools can’t auto-scale. Appendix B: Key Points and Best Practices No On-the-Fly Profile Changes With AKS, once a node pool is created with --gpu-instance-profile MIGxg, you cannot switch to a different MIG layout on that same node pool. If you need a new MIG profile, create a new node pool. --skip-gpu-driver-install This is typically used if you need a specific driver version, or if you want the GPU Operator to manage drivers (instead of the in-box AKS driver). Make sure your driver is installed before you schedule GPU workloads. If the driver is missing, pods that request GPU resources will fail to initialize. Driver Versions for H100 H100 requires driver branch R525 or newer (and CUDA 12+). Verify the GPU Operator or your manual install uses a driver that supports H100 and MIG on H100 specifically. Single vs. Mixed Strategy Single strategy lumps all MIG slices together as nvidia.com/gpu. This is simpler for uniform MIG node pools. Mixed strategy exposes resources like nvidia.com/mig-1g.10gb. Use if you need explicit scheduling by MIG slice type. Configure this in the GPU Operator’s Helm values (e.g., --set mig.strategy=single or mixed). If the Operator’s MIG Manager is disabled, it won’t attempt to reconfigure MIG, but it will still let the device plugin report the slices in single or mixed mode. Resource Requests and Scheduling If using single strategy, a pod that requests nvidia.com/gpu: 1 will be allocated a single 1g.10gb MIG slice on H100. If using mixed, that same request must specifically match the MIG resource name (e.g., nvidia.com/mig-1g.10gb: 1). If your pod requests nvidia.com/gpu: 1, but the node only advertises nvidia.com/mig-1g.10gb, scheduling won’t match. So be consistent in your pod specs. Cluster Autoscaler Currently, MIG-enabled node pools have limited or no autoscaler support on AKS (the cluster autoscaler does not fully account for MIG resources). Scale these node pools manually or via custom logic. If you rely heavily on auto-scaling, consider using a standard GPU node pool (no MIG) or carefully plan capacity to avoid needing dynamic scaling for MIG pools. Monitoring The GPU Operator deploys DCGM exporter by default, which can collect MIG-specific metrics. Integrate with Prometheus + Grafana for GPU usage dashboards. MIG slices are typically identified by unique device IDs in DCGM. You can see which MIG slices are busier than others, memory usage, etc. Node Image Upgrades Because you’re skipping the driver install from AKS, ensure you keep your GPU driver DaemonSet or Operator up to date. If you do a node image upgrade (AKS version upgrade), the OS might change, requiring a recompile or a matching driver version. The GPU Operator normally handles this seamlessly by re-installing the driver on the new node image. Test your upgrades in a staging cluster if possible, especially with new AKS releases or driver versions. Handling Multiple Node Pools Many users create one node pool with full GPUs (no MIG) for large jobs, and another MIG-enabled node pool for smaller parallel workloads. You can do so easily by repeating the steps above for each node pool, specifying different MIG profiles. References MIG User Guide NVIDIA GPU Operator with Azure Kubernetes Service ND-H100-v5 sizes series Create a multi-instance GPU node pool in Azure Kubernetes Service (AKS)2KViews2likes0CommentsMonitor OpenAI Agents SDK with Application Insights
As AI agents become more prevalent in applications, monitoring their behavior and performance becomes crucial. In this blog post, we'll explore how to monitor the OpenAI Agents SDK using Azure Application Insights through OpenTelemetry integration. Enhancing OpenAI Agents with OpenTelemetry The OpenAI Agents SDK provides powerful capabilities for building agent-based applications. By default, the SDK doesn't emit OpenTelemetry data, as noted in GitHub issue #18. This presents an opportunity to extend the SDK's functionality with robust observability features. Adding OpenTelemetry integration enables you to: Track agent interactions across distributed systems Monitor performance metrics in production Gain insights into agent behaviour Seamlessly integrate with existing observability platforms Fortunately, the Pydantic Logfire SDK has implemented an OpenTelemetry instrumentation wrapper for OpenAI Agents. This wrapper allows us to capture telemetry data and propagate it to an OpenTelemetry Collector endpoint. How It Works The integration works by wrapping the OpenAI Agents tracing provider with a Logfire-compatible wrapper that generates OpenTelemetry spans for various agent activities: Agent runs Function calls Chat completions Handoffs between agents Guardrail evaluations Each of these activities is captured as a span with relevant attributes that provide context about the operation. Implementation Example Here's how to set up the Logfire instrumentation in your application: import logfire from openai import AsyncAzureOpenAI from agents import set_default_openai_client, set_tracing_disabled # Configure your OpenAI client azure_openai_client = AsyncAzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version=os.getenv("AZURE_OPENAI_API_VERSION"), azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT") ) # Set as default client and enable tracing set_default_openai_client(azure_openai_client) set_tracing_disabled(False) # Configure OpenTelemetry endpoint os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://0.0.0.0:4318/v1/traces" # Configure Logfire logfire.configure( service_name='my-agent-service', send_to_logfire=False, distributed_tracing=True ) # Instrument OpenAI Agents logfire.instrument_openai_agents() Note: The send_to_logfire=False parameter ensures that data is only sent to your OpenTelemetry collector, not to Logfire's cloud service. Environment Variables: The OTEL_EXPORTER_OTLP_TRACES_ENDPOINT environment variable tells the Logfire SDK where to send the OpenTelemetry traces. If you're using Azure Container Apps with the built-in OpenTelemetry collector, this variable will be automatically set for you. Similarly, when using AKS with auto-instrumentation enabled via the OpenTelemetry Operator, this environment variable is automatically injected into your pods. For other environments, you'll need to set it manually as shown in the example above. Setting Up the OpenTelemetry Collector To collect and forward the telemetry data to Application Insights, we need to set up an OpenTelemetry Collector. There are two ways to do this: Option 1: Run the Collector Locally Find the right OpenTelemetry Contrib Releases for your processor architecture at: https://github.com/open-telemetry/opentelemetry-collector-releases/releases/tag/v0.121.0 Only Contrib releases will support Azure Monitor exporter. ./otelcol-contrib --config=otel-collector-config.yaml Option 2: Run the Collector in Docker docker run --rm \ -v $(pwd)/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml \ -p 4318:4318 \ -p 55679:55679 \ otel/opentelemetry-collector-contrib:latest Collector Configuration Here's a basic configuration for the OpenTelemetry Collector that forwards data to Azure Application Insights: receivers: otlp: protocols: http: endpoint: "0.0.0.0:4318" exporters: azuremonitor: connection_string: "InstrumentationKey=your-instrumentation-key;IngestionEndpoint=https://your-region.in.applicationinsights.azure.com/" maxbatchsize: 100 maxbatchinterval: 10s debug: verbosity: basic service: pipelines: traces: receivers: [otlp] exporters: [azuremonitor, debug] Important: Replace connection_string with your actual Application Insights connection string. What You Can Monitor With this setup, you can monitor various aspects of your OpenAI Agents in Application Insights: Agent Performance: Track how long each agent takes to process requests Model Usage: Monitor which AI models are being used and their response times Function Calls: See which tools/functions are being called by agents Handoffs: Track when agents hand off tasks to other specialized agents Errors: Identify and diagnose failures in agent processing End-to-End Traces: Follow user requests through your entire system Example Trace Visualisation In Application Insights, you can visualise the traces as a hierarchical timeline, showing the flow of operations: Known Issue: Span Name Display in Application Insights When using LogFire SDK 3.8.1 with Application Insights, you might notice that span names appear as message templates (with regular expressions) instead of showing the actual agent or model names. This makes it harder to identify specific spans in the Application Insights UI. Issue: In the current implementation of LogFire SDK's OpenAI Agents integration (source code), the message template is used as the span's name, resulting in spans being displayed with placeholders like {name!r} or {gen_ai.request.model!r} instead of actual values. Temporary Fix Until LogFire SDK introduces a fix, you can modify the /logfire/_internal/integrations/openai_agents.py file to properly format the span names. This is after pip install logfire, the file will usually be at venv/lib/python3.11/site-packages/logfire/_internal/integrations/openai_agents.py Replace the span creation code around line 100: Original code logfire_span = self.logfire_instance.span( msg_template, **attributes_from_span_data(span_data, msg_template), **extra_attributes, _tags=['LLM'] * isinstance(span_data, GenerationSpanData), ) Modified code with setting Span name as message attributes = attributes_from_span_data(span_data, msg_template) message = logfire_format(msg_template, dict(attributes or {}), NOOP_SCRUBBER) logfire_span = self.logfire_instance.span( msg_template, _span_name=message, **attributes, **extra_attributes, _tags=['LLM'] * isinstance(span_data, GenerationSpanData), ) This change formats the message template with actual values and sets it as the span name, making it much easier to identify spans in the Application Insights UI. After applying this fix, your spans will display meaningful names like "Chat completion with 'gpt-4o'" instead of "Chat completion with {gen_ai.request.model!r}". Limitation: Even after applying this fix, HandOff spans will still not show the correct to_agent field in the span name. This occurs because the to_agent field is not set during initial span creation but later in the on_ending method of the LogfireSpanWrapper class: @dataclass class LogfireSpanWrapper(LogfireWrapperBase[Span[TSpanData]], Span[TSpanData]): # ... def on_ending(self): # This is where to_agent gets updated, but too late for the span name # ... Until LogFire SDK optimizes this behavior, you can still see the correct HandOff values by clicking on the span and looking at the logfire.msg property. For example, you'll see "Handoff: Customer Service Agent -> Investment Specialist" in the message property even if the span name doesn't show it correctly. Auto-Instrumentation for AKS Azure Kubernetes Service (AKS) offers a codeless way to enable OpenTelemetry instrumentation for your applications. This approach simplifies the setup process and ensures that your OpenAI Agents can send telemetry data without requiring manual instrumentation. How to Enable Auto-Instrumentation To enable auto-instrumentation for Python applications in AKS, you can add an annotation to your pod specification: annotations: instrumentation.opentelemetry.io/inject-python: 'true' This annotation tells the OpenTelemetry Operator to inject the necessary instrumentation into your Python application. For more details, refer to the following resources: Microsoft Learn: Codeless application monitoring for Kubernetes OpenTelemetry Docs: Automatic Instrumentation for Kubernetes Built-in Managed OpenTelemetry Collector in Azure Container Apps Azure Container Apps provides a built-in Managed OpenTelemetry Collector that simplifies the process of collecting and forwarding telemetry data to Application Insights. This eliminates the need to deploy and manage your own collector instance. Setting Up the Managed Collector When you enable the built-in collector, Azure Container Apps automatically sets the OTEL_EXPORTER_OTLP_ENDPOINT environment variable for your applications. This allows the Logfire SDK to send traces to the collector without any additional configuration. Here's an example of enabling the collector in an ARM template: { "type": "Microsoft.App/containerApps", "properties": { "configuration": { "dapr": {}, "ingress": {}, "observability": { "applicationInsightsConnection": { "connectionString": "InstrumentationKey=your-instrumentation-key" } } } } } For more information, check out these resources: Microsoft Learn: OpenTelemetry agents in Azure Container Apps Tech Community: How to monitor applications by using OpenTelemetry on Azure Container Apps Conclusion Monitoring OpenAI Agents with Application Insights provides valuable insights into your AI systems' performance and behavior. By leveraging the Pydantic Logfire SDK's OpenTelemetry instrumentation and the OpenTelemetry Collector, you can gain visibility into your agents' operations and ensure they're functioning as expected. This approach allows you to integrate AI agent monitoring into your existing observability stack, making it easier to maintain and troubleshoot complex AI systems in production environments. Resources Implementation can be found at https://github.com/hieumoscow/azure-openai-agents References: OpenAI Agents Python SDK GitHub Issue: OpenAI Agents Logging OpenTelemetry Collector Documentation Azure Application Insights Documentation Codeless application monitoring for Kubernetes OpenTelemetry Automatic Instrumentation for Kubernetes OpenTelemetry agents in Azure Container Apps How to monitor applications using OpenTelemetry on Azure Container Apps3.5KViews1like5CommentsAnnouncing a flexible, predictable billing model for Azure SRE Agent
Billing for Azure SRE Agent will start on September 1, 2025. Announced at Microsoft Build 2025, Azure SRE Agent is a pre-built AI agent for root cause analysis, uptime improvement, and operational cost reduction. Learn more about the billing model and example scenarios.2.2KViews1like1CommentPrivate Pod Subnets in AKS Without Overlay Networking
When deploying AKS clusters, a common concern is the amount of IP address space required. If you are deploying your AKS cluster into your corporate network, the size of the IP address space you can obtain may be quite small, which can cause problems with the number of pods you are able to deploy. The simplest and most common solution to this is to use an overlay network, which is fully supported in AKS. In an overlay network, pods are deployed to a private, non-routed address space that can be as large as you want. Translation between the routable and non-routed network is handled by AKS. For most people, this is the best option for dealing with IP addressing in AKS, and there is no need to complicate things further. However, there are some limitations with overlay networking, primarily that you cannot address the pods directly from the rest of the network— all inbound communication must go via services. There are also some advanced features that are not supported, such as Virtual Nodes. If you are in a scenario where you need some of these features, and overlay networking will not work for you, it is possible to use the more traditional vNet-based deployment method, with some tweaks. Azure CNI Pod Subnet The alternative to using the Azure CNI Overlay is to use the Azure CNI Pod Subnet. In this setup, you deploy a vNet with two subnets - one for your nodes and one for pods. You are in control of the IP address configuration for these subnets. To conserve IP addresses, you can create your pod subnet using an IP range that is not routable to the rest of your corporate network, allowing you to make it as large as you like. The node subnet remains routable from your corporate network. In this setup, if you want to talk to the pods directly, you would need to do so from within the AKS vNet or peer another network to your pod subnet. You would not be able to address these pods from the rest of your corporate network, even without using overlay networking. The Routing Problem When you deploy a setup using Azure CNI Pod Subnet, all the subnets in the vNet are configured with routes and can talk to each other. You may wish to connect this vNet to other Azure vNets via peering, or to your corporate network using ExpressRoute or VPN. However, where you will encounter an issue is if your pods try to connect to resources outside of your AKS vNet but inside your corporate network, or any peered Azure vNets (which are not peered to this isolated subnet). In this scenario, the pods will route their traffic directly out of the vNet using their private IP address. This private IP is not a valid, routable IP, so the resources on the other network will not be able to reply, and the request will fail. IP Masquerading To resolve this issue, we need a way to have traffic going to other networks present a private IP that is routable within the network. This can be achieved through several methods. One method would be to introduce a separate solution for routing this traffic, such as Azure Firewall or another Network Virtual Appliance (NVA). Traffic is routable between the pod and node subnet, so the pod can send its requests to the firewall, and then the requests to the remote network come from the IP of the firewall, which is routable. This solution will work but does require another resource to be deployed, with additional costs. If you are already using an Azure Firewall for outbound traffic, then this may be something you could use, but we are looking for a simpler and more cost-effective solution. Rather than implementing another device to present a routable IP, we can use the nodes of our AKS clusters. The AKS nodes are in the routable node subnet, so ideally we want our outbound traffic from the pods to use the node IP when it needs to leave the vNet to go to the rest of the private network. There are several different ways you could achieve this goal. You could look at using Egress Gateway services through tools like Istio, or you could look at making changes to the iptables configuration on the nodes using a DaemonSet. In this article, we will focus on using a tool called ip-masq-agent-v2. This tool provides a means for traffic to "masquerade" as coming from the IP address of the node it is running on and have the node perform Network Address Translation (NAT). If you deploy a cluster with an overlay network, this tool is already deployed and configured on your cluster. This is the tool that Microsoft uses to configure NAT for traffic leaving the overlay network. When using pod subnet clusters, this tool is not deployed, but you can deploy it yourself to provide the same functionality. Under the hood, this tool is making changes to iptables using a DaemonSet that runs on each node, so you could replicate this behaviour yourself—but this provides a simpler process that has been tested with AKS through overlay networking. The Microsoft v2 version of this is based on the original Kubernetes contribution, aiming to solve more specific networking cases, allow for more configuration options, and improve observability. Deploy ip-masq-agent-v2 There are two parts to deploying the agent. First, we deploy the agent, which runs as a DaemonSet, spawning a pod on each node in the cluster. This is important, as each node needs to have the iptables altered by the tool, and it needs to run anytime a new node is created. To deploy the agent, we need to create the DaemonSet in our cluster. The ip-masq-agent-v2 repo includes several examples, including an example of deploying the DaemonSet. The example is slightly out of date on the version of ip-masq-agent-v2 to use, so make sure you update this to the latest version. If you would prefer to build and manage your own containers for this, the repository also includes a Dockerfile to allow you to do this. Below is an example deployment using the Microsoft-hosted images. It references the ConfigMap we will create in the next step, and it is important that the same name is used as is referenced here. apiVersion: apps/v1 kind: DaemonSet metadata: name: ip-masq-agent namespace: kube-system labels: component: ip-masq-agent kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: selector: matchLabels: k8s-app: ip-masq-agent template: metadata: labels: k8s-app: ip-masq-agent spec: hostNetwork: true containers: - name: ip-masq-agent image: mcr.microsoft.com/aks/ip-masq-agent-v2:v0.1.15 imagePullPolicy: Always securityContext: privileged: false capabilities: add: ["NET_ADMIN", "NET_RAW"] volumeMounts: - name: ip-masq-agent-volume mountPath: /etc/config readOnly: true volumes: - name: ip-masq-agent-volume projected: sources: - configMap: name: ip-masq-agent-config optional: true items: - key: ip-masq-agent path: ip-masq-agent mode: 0444 Once you deploy this DaemonSet, you should see instances of the agent running on each node in your cluster. Create Configuration Next, we need to create a ConfigMap that contains any configuration data we need to vary from the default deployed with the agent. The main thing we need to configure is the IP ranges that will be masqueraded as an agent IP. The default deployment of ip-masq-agent-v2 disables masquerading for all three private IP ranges specified by RFC 1918 (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16). In our example above, this will therefore not masquerade traffic to the 10.1.64.0/18 subnet in the app network, and our routing problem will still exist. We need to amend the configuration so that these private IPs are masqueraded. However, we do want to avoid masquerading within our AKS network, as this traffic needs to come from the pod IPs. Therefore, we need to ensure we do not masquerade for traffic going from the pods to: The pod subnet The node subnet The AKS service CIDR range, for internal networking in AKS To do this, we need to add these IP ranges to the nonMasqueradeCIDRs array in the configuration. This is the list of IP addresses which, when traffic is sent to them, will continue to come from the pod IP and not the node IP. In addition, the configuration also allows us to define if we masquerade the link-local IPs, which we do not want to do. Below is an example ConfigMap that works for the setup detailed above. apiVersion: v1 kind: ConfigMap metadata: name: ip-masq-agent-config namespace: kube-system labels: component: ip-masq-agent kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: ip-masq-agent: |- nonMasqueradeCIDRs: - 10.0.0.0/16 # Entire VNet and service CIDR - 192.168.0.0/16 masqLinkLocal: false masqLinkLocalIPv6: false There are a couple of things to be aware of here: The node subnet and AKS Service CIDR are two contiguous address spaces in my setup, so both are covered by 10.0.0.0/16. I could have called them out separately. 192.168.0.0/16 covers the whole of my pod subnet. I do not enable masquerading on link-local. The ConfigMap needs to be created in the same namespace as the DaemonSet. The ConfigMap name needs to match what is used in the mount in the DaemonSet manifest. Once you apply this configuration, the agent will pick up the new configuration changes within around 60 seconds. Once these are applied, you should find that traffic going to private addresses outside of the list of nonMasqueradeCIDRs will now present from the node IP. Summary If you’re deploying AKS into an IP-constrained environment, overlay networking is generally the best and simplest option. It allows you to use non-routed pod IP ranges, conserve address space, and avoid complex routing considerations without additional configuration. If you can use it, then this should be your default approach. However, there are cases where overlay networking will not meet your needs. You might require features only available with pod subnet mode, such as the ability to send traffic directly to pods and nodes without tunnelling, or support for features like Virtual Nodes. In these situations, you can still keep your pod subnet private and non-routed by carefully controlling IP masquerading. With ip-masq-agent-v2, you can configure which destinations should (and should not) be NAT’d, ensuring isolated subnets while maintaining the functionality you need.374Views0likes0CommentsSimplifying Outbound Connectivity Troubleshooting in AKS with Connectivity Analysis (Preview)
Announce the Connectivity Analysis feature for AKS, now available in Public Preview and available through the AKS Portal. You can use the Connectivity Analysis (Preview) feature to quickly verify whether outbound traffic from your AKS nodes is being blocked by Azure network resources such as Azure Firewall, Network Security Groups (NSGs), route tables, and more.741Views1like0CommentsAzure at KubeCon India 2025 | Hyderabad, India – 6-7 August 2025
Welcome to KubeCon + CloudNativeCon India 2025! We’re thrilled to join this year’s event in Hyderabad as a Gold sponsor, where we’ll be highlighting the newest innovations in Azure and Azure Kubernetes Service (AKS) while connecting with India’s dynamic cloud-native community. We’re excited to share some powerful new AKS capabilities that bring AI innovation to the forefront, strengthen security and networking, and make it easier than ever to scale and streamline operations. Innovate with AI AI is increasingly central to modern applications and competitive innovation, and AKS is evolving to support intelligent agents more natively. The AKS Model Context Protocol (MCP) server, now in public preview, introduces a unified interface that abstracts Kubernetes and Azure APIs, allowing AI agents to manage clusters more easily across environments. This simplifies diagnostics and operations—even across multiple clusters—and is fully open-source, making it easier to integrate AI-driven tools into Kubernetes workflows. Enhance networking capabilities Networking is foundational to application performance and security. This wave of AKS improvements delivers more control, simplicity, and scalability in networking: Traffic between AKS services can now be filtered by HTTP methods, paths, and hostnames using Layer-7 network policies, enabling precise control and stronger zero-trust security. Built-in HTTP proxy management simplifies cluster-wide proxy configuration and allows easy disabling of proxies, reducing misconfigurations while preserving future settings. Private AKS clusters can be accessed securely through Azure Bastion integration, eliminating the need for VPNs or public endpoints by tunneling directly with kubectl. DNS performance and resilience are improved with LocalDNS for AKS, which enables pods to resolve names even during upstream DNS outages, with no changes to workloads. Outbound traffic from AKS can now use static egress IP prefixes, ensuring predictable IPs for compliance and smoother integration with external systems. Cluster scalability is enhanced by supporting multiple Standard Load Balancers, allowing traffic isolation and avoiding rule limits by assigning SLBs to specific node pools or services. Network troubleshooting is streamlined with Azure Virtual Network Verifier, which runs connectivity tests from AKS to external endpoints and identifies misconfigured firewalls or routes. Strengthen security posture Security remains a foundational priority for Kubernetes environments, especially as workloads scale and diversify. The following enhancements strengthen protection for data, infrastructure, and applications running in AKS—addressing key concerns around isolation, encryption, and visibility. Confidential VMs for Azure Linux enable containers to run on hardware-encrypted, isolated VMs using AMD SEV-SNP, providing data-in-use protection for sensitive workloads without requiring code changes. Confidential VMs for Ubuntu 24.04 combine AKS’s managed Kubernetes with memory encryption and VM-level isolation, offering enhanced security for Linux containers in Ubuntu-based clusters. Encryption in transit for NFS secures data between AKS pods and Azure Files NFS volumes using TLS 1.3, protecting sensitive information without modifying applications. Web Application Firewall for Containers adds OWASP rule-based protection to containerized web apps via Azure Application Gateway, blocking common exploits without separate WAF appliances. The AKS Security Dashboard in Azure Portal centralizes visibility into vulnerabilities, misconfigurations, compliance gaps, and runtime threats, simplifying cluster security management through Defender for Cloud. Simplify and scale operations To streamline operations at scale, AKS is introducing new capabilities that automate resource provisioning, enforce deployment best practices, and simplify multi-tenant management—making it easier to maintain performance and consistency across complex environments. Node Auto-Provisioning improves resource efficiency by automatically adding and removing standalone nodes based on pod demand, eliminating the need for pre-created node pools during traffic spikes. Deployment Safeguards help prevent misconfigurations by validating Kubernetes manifests against best practices and optionally enforcing corrections to reduce instability and security risks. Managed Namespaces streamline multi-tenant cluster operations by providing a unified view of accessible namespaces across AKS clusters, along with quick access credentials via CLI, API, or Portal. Maximize performance and visibility To enhance performance and observability in large-scale environments, AKS is also rolling out infrastructure-level upgrades that improve monitoring capacity and control plane efficiency. Prometheus quotas in Azure Monitor can now be raised to 20 million samples per minute or active time series, ensuring full metric coverage for massive AKS deployments. Control plane performance has been improved with a backported Kubernetes enhancement (KEP-5116), reducing API server memory usage by ~10× during large listings and enabling faster kubectl responses with lower risk of OOM issues in AKS versions 1.31.9 and above. Microsoft is at KubeCon India 2025 - come say hi! Connect with us in Hyderabad! Microsoft has a strong on-site presence at KubeCon + CloudNativeCon India 2025. Here are some highlights of how you can connect with us at the event: August 6-7: Visit Microsoft at Booth G4 for live demos and expert Q&A throughout the conference. Microsoft engineers are also delivering several breakout sessions on AKS and cloud-native technologies. Microsoft Sessions: Throughout the conference, Microsoft engineers are speaking in various sessions, including: Keynote: The Last Mile Problem: Why AI Won’t Replace You (Yet) Lightning Talk: Optimizing SNAT Port and IP Address Management in Kubernetes Smart Capacity-Aware Volume Provisioning for LVM Local Storage Across Multi-Cluster Kubernetes Fleet Minimal OS, Maximum Impact: Journey To a Flatcar Maintainer We’re thrilled to connect with you at KubeCon + CloudNativeCon India 2025. Whether you attend sessions, drop by our booth, or watch the keynote, we look forward to discussing these announcements and hearing your thoughts. Thank you for being part of the community, and happy KubeCon! 👋491Views2likes0CommentsGeneral Availability of Azure Monitor Network Security Perimeter Features
We’re excited to announce that Azure Monitor Network Security Perimeter features are now generally available! This update is an important step forward for Azure Monitor’s security, providing comprehensive network isolation for your monitoring data. In this post, we’ll explain what Network Security Perimeter is, why it matters, and how it benefits Azure Monitor users. Network Security Perimeter is purpose-built to strengthen network security and monitoring, enabling customers to establish a more secure and isolated environment. As enterprise interest grows, it’s clear that this feature will play a key role in elevating the protection of Azure PaaS resources against evolving security threats. What is Network Security Perimeter and Why Does It Matter? Network Security Perimeter is a network isolation feature for Azure PaaS services that creates a trusted boundary around your resources. Azure Monitor’s key components (like Log Analytics workspaces and Application Insights) run outside of customer virtual networks; Network security perimeter allows these services to communicate only within an explicit perimeter and blocks any unauthorized public access. In essence, the security perimeter acts as a virtual firewall at the Azure service level – by default it restricts public network access to resources inside the perimeter, and only permits traffic that meets your defined rules. This prevents unwanted network connections and helps prevent data exfiltration (sensitive monitoring data stays within your control). For Azure Monitor customers, Network Security Perimeter is a game-changer. It addresses a common ask from enterprises for “zero trust” network security on Azure’s monitoring platform. Previously, while you could use Private Link to secure traffic from your VNets to Azure Monitor, Azure Monitor’s own service endpoints were still accessible over the public internet. The security perimeter closes that gap by enforcing network controls on Azure’s side. This means you can lock down your Log Analytics workspace or Application Insights to only accept data from specific sources (e.g. certain IP ranges, or other resources in your perimeter) and only send data out to authorized destinations. If anything or anyone outside those rules attempts to access your monitoring resources, Network Security Perimeter will deny it and log the attempt for auditing. In short, Network Security Perimeter brings a new level of security to Azure Monitor: it allows organizations to create a logical network boundary around their monitoring resources, much like a private enclave. This is crucial for customers in regulated industries (finance, government, healthcare) who need to ensure their cloud services adhere to strict network isolation policies. By using the security perimeter, Azure Monitor can be safely deployed in environments that demand no public exposure and thorough auditing of network access. It’s an important step in strengthening Azure Monitor’s security posture and aligning with enterprise zero-trust networking principles. Key Benefits of Network Security Perimeter in Azure Monitor With Network Security Perimeter now generally available, Azure Monitor users gain several powerful capabilities: 🔒 Enhanced Security & Data Protection: Azure PaaS resources in a perimeter can communicate freely with each other, but external access is blocked by default. You define explicit inbound/outbound rules for any allowed public traffic, ensuring no unauthorized network access to your Log Analytics workspaces, Application Insights components, or other perimeter resources. This greatly reduces the risk of data exfiltration and unauthorized access to monitoring data. ⚖️ Granular Access Control: Network Security Perimeter supports fine-grained rules to tailor access. You can allow inbound access by specific IP address ranges or Azure subscription IDs, and allow outbound calls to specific Fully Qualified Domain Names (FQDNs). For example, you might permit only your corporate IP range to send telemetry to a workspace, or allow a workspace to send data out only to contoso-api.azurewebsites.net. This level of control ensures that only trusted sources and destinations are used. 📜 Comprehensive Logging & Auditing: Every allowed or denied connection governed by Network Security Perimeter can be logged. Azure Monitor’s Network Security Perimeter integration provides unified access logs for all resources in the perimeter. These logs give you visibility into exactly what connections were attempted, from where, and whether they were permitted or blocked. This is invaluable for auditing and compliance – for instance, proving that no external IPs accessed your workspace, or detecting unexpected outbound calls. The logs can be sent to a Log Analytics workspace or storage for retention and analysis. 🔧 Seamless Integration with Azure Monitor Services: Network Security Perimeter is natively integrated across Azure Monitor’s services and workflows. Log Analytics workspaces and Application Insights components support Network Security Perimeter out-of-the-box, meaning ingestion, queries, and alerts all enforce perimeter rules behind the scenes. Azure Monitor Alerts (scheduled query rules) and Action Groups also work with Network Security Perimeter , so that alert notifications or automation actions respect the perimeter (for example, an alert sending to an Event Hub will check Network Security Perimeter rules). This end-to-end integration ensures that securing your monitoring environment with Network Security Perimeter doesn’t break any functionality – everything continues to work, but within your defined security boundary. 🤝 Consistent, Centralized Management: Network Security Perimeter introduces a uniform way to manage network access for multiple resources. You can group resources from different services (and even different subscriptions) into one perimeter and manage network rules in one place. This “single pane of glass” approach simplifies operations: network admins can define a perimeter once and apply it to all relevant Azure Monitor components (and other supported services). It’s a more scalable and consistent method than maintaining disparate firewall settings on each service. Network Security Perimeter uses Azure’s standard API and portal experience, so setting up a perimeter and rules is straightforward. 🌐 No-Compromise Isolation (with Private Link): Network Security Perimeter complements existing network security options. If you’re already using Azure Private Link to keep traffic off the internet, Network Security Perimeter adds another layer of protection. Private Link secures traffic between your VNet and Azure Monitor; Network Security Perimeter secures Azure Monitor’s service endpoints themselves. Used together, you achieve defense-in-depth: e.g., a workspace can be accessible only via private endpoint and only accept data from certain sources due to Network Security Perimeter . This layered approach helps meet even the most stringent security requirements. In conclusion, Network Security Perimeter for Azure Monitor provides strong network isolation, flexible control, and visibility – all integrated into the Azure platform. It helps organizations confidently use Azure Monitor in scenarios where they need to lock down network access and simplify compliance. For detailed information on configuring Azure Monitor with a Network Security Perimeter, please refer to the following link: Configure Azure Monitor with Network Security Perimeter.1KViews1like0Comments