azure managed grafana
24 TopicsGeneral Availability of Azure Monitor Network Security Perimeter Features
We’re excited to announce that Azure Monitor Network Security Perimeter features are now generally available! This update is an important step forward for Azure Monitor’s security, providing comprehensive network isolation for your monitoring data. In this post, we’ll explain what Network Security Perimeter is, why it matters, and how it benefits Azure Monitor users. Network Security Perimeter is purpose-built to strengthen network security and monitoring, enabling customers to establish a more secure and isolated environment. As enterprise interest grows, it’s clear that this feature will play a key role in elevating the protection of Azure PaaS resources against evolving security threats. What is Network Security Perimeter and Why Does It Matter? Network Security Perimeter is a network isolation feature for Azure PaaS services that creates a trusted boundary around your resources. Azure Monitor’s key components (like Log Analytics workspaces and Application Insights) run outside of customer virtual networks; Network security perimeter allows these services to communicate only within an explicit perimeter and blocks any unauthorized public access. In essence, the security perimeter acts as a virtual firewall at the Azure service level – by default it restricts public network access to resources inside the perimeter, and only permits traffic that meets your defined rules. This prevents unwanted network connections and helps prevent data exfiltration (sensitive monitoring data stays within your control). For Azure Monitor customers, Network Security Perimeter is a game-changer. It addresses a common ask from enterprises for “zero trust” network security on Azure’s monitoring platform. Previously, while you could use Private Link to secure traffic from your VNets to Azure Monitor, Azure Monitor’s own service endpoints were still accessible over the public internet. The security perimeter closes that gap by enforcing network controls on Azure’s side. This means you can lock down your Log Analytics workspace or Application Insights to only accept data from specific sources (e.g. certain IP ranges, or other resources in your perimeter) and only send data out to authorized destinations. If anything or anyone outside those rules attempts to access your monitoring resources, Network Security Perimeter will deny it and log the attempt for auditing. In short, Network Security Perimeter brings a new level of security to Azure Monitor: it allows organizations to create a logical network boundary around their monitoring resources, much like a private enclave. This is crucial for customers in regulated industries (finance, government, healthcare) who need to ensure their cloud services adhere to strict network isolation policies. By using the security perimeter, Azure Monitor can be safely deployed in environments that demand no public exposure and thorough auditing of network access. It’s an important step in strengthening Azure Monitor’s security posture and aligning with enterprise zero-trust networking principles. Key Benefits of Network Security Perimeter in Azure Monitor With Network Security Perimeter now generally available, Azure Monitor users gain several powerful capabilities: 🔒 Enhanced Security & Data Protection: Azure PaaS resources in a perimeter can communicate freely with each other, but external access is blocked by default. You define explicit inbound/outbound rules for any allowed public traffic, ensuring no unauthorized network access to your Log Analytics workspaces, Application Insights components, or other perimeter resources. This greatly reduces the risk of data exfiltration and unauthorized access to monitoring data. ⚖️ Granular Access Control: Network Security Perimeter supports fine-grained rules to tailor access. You can allow inbound access by specific IP address ranges or Azure subscription IDs, and allow outbound calls to specific Fully Qualified Domain Names (FQDNs). For example, you might permit only your corporate IP range to send telemetry to a workspace, or allow a workspace to send data out only to contoso-api.azurewebsites.net. This level of control ensures that only trusted sources and destinations are used. 📜 Comprehensive Logging & Auditing: Every allowed or denied connection governed by Network Security Perimeter can be logged. Azure Monitor’s Network Security Perimeter integration provides unified access logs for all resources in the perimeter. These logs give you visibility into exactly what connections were attempted, from where, and whether they were permitted or blocked. This is invaluable for auditing and compliance – for instance, proving that no external IPs accessed your workspace, or detecting unexpected outbound calls. The logs can be sent to a Log Analytics workspace or storage for retention and analysis. 🔧 Seamless Integration with Azure Monitor Services: Network Security Perimeter is natively integrated across Azure Monitor’s services and workflows. Log Analytics workspaces and Application Insights components support Network Security Perimeter out-of-the-box, meaning ingestion, queries, and alerts all enforce perimeter rules behind the scenes. Azure Monitor Alerts (scheduled query rules) and Action Groups also work with Network Security Perimeter , so that alert notifications or automation actions respect the perimeter (for example, an alert sending to an Event Hub will check Network Security Perimeter rules). This end-to-end integration ensures that securing your monitoring environment with Network Security Perimeter doesn’t break any functionality – everything continues to work, but within your defined security boundary. 🤝 Consistent, Centralized Management: Network Security Perimeter introduces a uniform way to manage network access for multiple resources. You can group resources from different services (and even different subscriptions) into one perimeter and manage network rules in one place. This “single pane of glass” approach simplifies operations: network admins can define a perimeter once and apply it to all relevant Azure Monitor components (and other supported services). It’s a more scalable and consistent method than maintaining disparate firewall settings on each service. Network Security Perimeter uses Azure’s standard API and portal experience, so setting up a perimeter and rules is straightforward. 🌐 No-Compromise Isolation (with Private Link): Network Security Perimeter complements existing network security options. If you’re already using Azure Private Link to keep traffic off the internet, Network Security Perimeter adds another layer of protection. Private Link secures traffic between your VNet and Azure Monitor; Network Security Perimeter secures Azure Monitor’s service endpoints themselves. Used together, you achieve defense-in-depth: e.g., a workspace can be accessible only via private endpoint and only accept data from certain sources due to Network Security Perimeter . This layered approach helps meet even the most stringent security requirements. In conclusion, Network Security Perimeter for Azure Monitor provides strong network isolation, flexible control, and visibility – all integrated into the Azure platform. It helps organizations confidently use Azure Monitor in scenarios where they need to lock down network access and simplify compliance. For detailed information on configuring Azure Monitor with a Network Security Perimeter, please refer to the following link: Configure Azure Monitor with Network Security Perimeter.937Views1like0CommentsAzure Monitor Private Link Scope (AMPLS) Scale Limits Increased by 10x!
What is Azure Monitor Private Link Scope (AMPLS)? Azure Monitor Private Link Scope (AMPLS) is a feature that allows you to securely connect Azure Monitor resources to your virtual network using private endpoints. This ensures that your monitoring data is accessed only through authorized private networks, preventing data exfiltration and keeping all traffic inside the Azure backbone network. AMPLS – Scale Limits Increased by 10x in Public Cloud - Public Preview In a groundbreaking development, we are excited to share that the scale limits for Azure Monitor Private Link Scope (AMPLS) have been significantly increased by tenfold (10x) in Public Cloud regions as part of the Public Preview! This substantial enhancement empowers our customers to manage their resources more efficiently and securely with private links using AMPLS, ensuring that workload logs are routed via the Microsoft backbone network. Addressing Customer Challenges Top Azure Strategic 500 customers, including leading Telecom service providers, Banking & Financial services customers, have reported that the previous limits of AMPLS were insufficient to meet their growing demands. The need for private links has surged 3-5 times beyond capacity, impacting network isolation and integration of critical workloads. Real-World Impact Our solution now enables customers to scale their Azure Monitor resources significantly, ensuring seamless network configurations and enhanced performance. Scenario 1: A Leading Telecom Service Provider known for its micro-segmentation architecture, have faced challenges with large-scale monitoring and reporting due to limitations on AMPLS. With the new solution, the customer can now scale up to 3,000 Log Analytics and 10,000 Application Insights workspaces with a single AMPLS resource, allowing them to configure over 13,000 Azure Monitor resources effortlessly. Scenario 2: A Leading Banking & Financial Services Customer have faced the scale challenges in delivering personalized insights due to complex workflows. By utilizing Azure Monitor with network isolation configurations, the customer can now scale their Azure Monitor resources to ensure secure telemetry flow and compliance. They have enabled thousands of Azure Monitor resources configured with AMPLS. Key Benefits to the Customer We believe that the solution our team has developed will significantly improve our customers' experience, allowing them to manage their resources more efficiently and effectively with private links using AMPLS. An AMPLS object can now connect up to 3,000 Log Analytics workspaces and 10,000 Application Insights components. (10x Increase) The Log Analytics workspace limit has been increased from 300 to 3,000 (10x increase). The Application Insights limit has increased from 1,000 to 10,000 (10x increase). An Azure Monitor resources can now connect up to 100 AMPLSs (20x increase). Data Collection Endpoint (DCE) Log Analytics Workspace (LA WS) Application Insights components (AI) An AMPLS object can connect to 10 private endpoints at most. Redesign of AMPLS – User experience to load 13K+ resources with Pagination Call to Action Explore the new capabilities of Azure Monitor Private Link Scope (AMPLS) and see how it can transform your network isolation and resource management. Visit our Azure Monitor Private Link Scope (AMPLS) documentation page for more details and start leveraging these enhancements today! For detailed information on configuring Azure Monitor private link scope and azure monitor resources, please refer to the following link: Configure Azure Monitor Private Link Scope (AMPLS) Configure Private Link for Azure Monitor621Views0likes0CommentsAzure Managed Grafana Brings Grafana 11 and More
We’re thrilled to announce the public preview of Grafana 11 and several feature enhancements in Azure Managed Grafana based on your feedback. We continue to evolve our service to deliver what matters most to our customers. Grafana 11 This annual major update to Grafana includes new functionality and improvements across dashboards, panels, queries, and alerts. The current preview in Managed Grafana offers Grafana v11.2. It includes the following key features: Explore Metrics Scenes powered dashboards Subfolders Numerous improvements to canvas visualization and alerting For more information on Grafana 11, please refer What’s new in Grafana v11.0, v11.1, and v11.2 and consider how the breaking changes may impact your specific use cases. You’ll need to create a new Managed Grafana instance to use Grafana 11 preview. Upgrading from Grafana 10 directly isn’t supported yet. You can copy over dashboards from your current Managed Grafana instance by following the steps in Migrate to Azure Managed Grafana. Please note that not all Grafana 11 features are available in Managed Grafana at present; if applicable, more features will be added over time. Azure Monitor Updates for Grafana 11 Improved Azure Monitor Logs visualizations This update extends Azure Monitor logs visualizations to support Basic Logs. This enables you to view Azure Monitor Log tables that have been configured with the lower cost Basic Log tier in Explore and dashboard panels. Additionally, Azure Monitor Logs details can now be viewed in Grafana Explore and Logs panels. You can filter query results by column values, run ad-hoc statistics and choose which column to display using simple point and click interaction without needing to modify the query text. Explore views also include options to view JSON data in dynamic columns. Azure Kubernetes Service users can leverage these views in a new Container Log dashboard. Prometheus Exemplars support for Azure Monitor Application Insight traces You can now drill down from Prometheus exemplars to Application Insights traces in Grafana. Using Exemplars in your troubleshooting workflow improves triage and analysis response times by allowing you to navigate from metrics to sample traces related to errors and exceptions and easily compare performance of transactions. To take advantage of this capability, the application needs to be instrumented to emit Prometheus metrics with Exemplars and traces to Azure Monitor Application Insights. Sign up for the Private Preview of Exemplars support in your Azure Monitor Workspace. User-Assigned Managed Identity Since its inception, Managed Grafana sets up a system-assigned managed identity for a new Grafana workspace by default. You can use this managed identity as the security principal to access backend data sources connected to your workspace. While it’s convenient to use, system-assigned managed identity isn’t always suitable. Enterprise customers who have stricter identity management policies typically create and manage all Entra ID identities by themselves. Managed Grafana now allows these customers to use identities defined in their Entra ID tenants instead. With the user-assigned managed identity feature, you can select an existing Entra ID identity to be used for authentication and authorization with your data sources. Please note that you can choose only one type of managed identity for each workspace. You can’t enable both system-assigned and user-assigned managed identities simultaneously. Grafana Settings Grafana server settings allow you to customize specific server behaviors. Managed Grafana configures and manages these settings automatically, so you don’t have to deal with them. There are some settings where usage varies from user to user. Managed Grafana now gives you the option to change their default values. The currently supported ones are: viewers_can_edit – determines whether users with the Grafana Viewer role can edit dashboards external_enabled – controls the public sharing of snapshots Grafana Migration Tool If you have a self-hosted Grafana server on-premises or in the cloud that you’d like to migrate to Managed Grafana, you can perform this operation with one command in the Azure CLI. The new az grafana migrate command automates the process of copying your existing dashboards from any Grafana server to your Managed Grafana workspace. It supports several options that control how the content migration should be conducted as well as a dry-run option for you to test and see the migration results before committing to the operation. Let Us Know How We’re Doing If you’re a current user of Managed Grafana, we’d love to hear from you. Please take a moment and fill out this online survey. It will help us further improve our service to better serve you. Thank you!1.3KViews2likes2CommentsVideo plugin for Managed Grafana
The video plugin https://grafana.com/grafana/plugins/innius-video-panel/?tab=overview is currently unavailable on Azure Managed Grafana. This plugin would be incredibly useful for our dashboards and is very popular on Grafana with 3463106 downloads.74Views1like1CommentGeneral Availability: Kubernetes Metadata and Logs Filtering in Azure Monitor-Container Insights
Today at Ignite, we are thrilled to announce the General Availability of Kubernetes Metadata and Logs Filtering in Azure Monitor – Container Insights! This enhancement brings additional Kubernetes metadata to the ContainerLogsV2 schema, including PodLabels, PodAnnotations, PodUid, Image, ImageID, ImageRepo, and ImageTag. Moreover, the new Logs Filtering feature allows for precise filtering of both workload and system pods/containers. These advancements not only provide users with richer context and enhanced visibility into their workloads but are crucial for customer troubleshooting as they provide deeper insights into the Kubernetes environment. Key Features Enhanced ContainerLogV2 schema with Kubernetes Metadata Fields: Detailed metadata fields enhance log analysis. These include “podLabels,” “podAnnotations,” “podUid,” “image,” “imageID,” “imageRepo,” and “imageTag.” Customized Include List Configuration: Users can tailor metadata fields via ConfigMap. All fields are collected by default. Enhanced ContainerLogV2 schema with Log Level: Assess application health with color-coded severity levels (e.g., CRITICAL, ERROR, WARNING). Helps incident response and proactive monitoring. Annotation Based Log Filtering for workloads: Efficient log filtering through podAnnotations. Focus on relevant information, optimizing costs and resource usage. ConfigMap Based Log Filtering for platform logs (System Kubernetes Namespaces): Enables ability to configure log collection of specific pods within the system namespaces through ConfigMap. Grafana Dashboard for Visualization: Leverage the power of Grafana Dashboard to visualize log levels, log volume, rate, records, and more. Empowers in-depth analysis and real-time monitoring. To learn more and enable this new feature, please visit our Kubernetes Metadata and Logs Filtering Documentation. If you have any questions or feedback on Kubernetes Logs Metadata and Filtering, please reach out to ibraraslam@microsoft.com or fill out this survey!477Views0likes0CommentsMonitoring GPU Metrics in AKS with Azure Managed Prometheus, DCGM Exporter and Managed Grafana
https://aka.ms/managedpromdocumentation provides a production-grade solution for monitoring without the hassle of installation and maintenance. By leveraging these managed services, we can focus on extracting insights from your metrics and logs rather than managing the underlying infrastructure. The integration of essential GPU metrics—such as Framebuffer Memory Usage, GPU Utilization, Tensor Core Utilization, and SM Clock Frequencies—into Azure Managed Prometheus and Grafana enhances the visualization of actionable insights. This integration facilitates a comprehensive understanding of GPU consumption patterns, enabling more informed decisions regarding optimization and resource allocation. Azure Managed Prometheus https://aka.ms/ampcrdblog of Operator and CRD support, which will enable customers to customize metrics collection and add scraping of metrics from workloads and applications using Service and Pod Monitors, similar to the OSS Prometheus Operator. This blog will demonstrate how we leveraged the CRD/Operator support in Azure Managed Prometheus and used the Nvidia DCGM Exporter and Grafana to enable GPU monitoring. GPU monitoring As the use of GPUs has skyrocketed for deploying large language models (LLMs) for both inference and fine-tuning, monitoring these resources becomes critical to ensure optimal performance and utilization. https://prometheus.io/docs/introduction/overview/, an open-source monitoring and alerting toolkit, coupled with https://grafana.com/docs/grafana/latest/fundamentals/, a powerful dashboarding and visualization tool, provides an excellent solution for collecting, visualizing, and acting on these metrics. Essential metrics such as Framebuffer Memory Usage, GPU Utilization, Tensor Core Utilization, and SM Clock Frequencies serve as fundamental indicators of GPU consumption, offering invaluable insights into the performance and efficiency of graphics processing units, and thereby enabling us to reduce our COGs and improve operations. Using Nvidia’s DGCM Exporter with Azure Managed Prometheus The https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/latest/dcgm-exporter.html is a tool developed by Nvidia to collect and export GPU metrics. It runs as a pod on Kubernetes clusters and gathers various metrics from Nvidia GPUs, such as utilization, memory usage, temperature, and power consumption. These metrics are crucial for monitoring and managing the performance of GPUs. You can integrate this exporter with Azure Managed Prometheus. The section below in blog describes the steps and changes needed to deploy the DCGM Exporter successfully. Prerequisites Before we jump straight to the installation, ensure your AKS cluster meets the following requirements: GPU Node Pool:https://learn.microsoft.com/azure/aks/create-node-pools with the required VM SKU that includes GPU support. GPU Driver: Ensure the https://learn.microsoft.com/azure/aks/gpu-cluster?tabs=add-ubuntu-gpu-node-pool driver is running as a DaemonSet on your GPU nodes. https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli Azure Managed Prometheus and Azure Managed Grafana on your AKS cluster. Refactoring Nvidia DCGM Exporter for AKS: Code Changes and Deployment Guide Updating API Versions and Configurations for Seamless Integration As per the official documentation, the best way to get started with DGCM Exporter is to install it using Helm. When installing over AKS with Managed Prometheus, you might encounter the below error: Error: Installation Failed: Unable to build Kubernetes objects from release manifest: resource mapping not found for name: "dcgm-exporter-xxxxx" namespace: "default" from "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1". Ensure CRDs are installed first. To resolve this, follow these steps to make necessary changes in the https://github.com/NVIDIA/dcgm-exporter: Clone the Project: Go to the GitHub repository of the DCGM Exporter and clone the project or download it to your local machine. Navigate to the Template Folder: The code used to deploy the DCGM Exporter is located in the template folder within the deployment folder. Modify the service-monitor.yaml File: Find the file service-monitor.yaml. The apiVersion key in this file needs to be updated from monitoring.coreos.com/v1 to azmonitoring.coreos.com/v1. This change allows the DCGM Exporter to use the Azure managed Prometheus CRD. apiVersion: azmonitoring.coreos.com/v1 4. Handle Node Selectors and Tolerations: GPU node pools often have tolerations and node selector tags. Modify the values.yaml file in the deployment folder to handle these configurations: nodeSelector: accelerator: nvidia tolerations: - key: "sku" operator: "Equal" value: "gpu" effect: "NoSchedule" Helm: Packaging, Pushing, and Installation on Azure Container Registry We followed the https://learn.microsoft.com/azure/container-registry/container-registry-helm-repos for pushing and installing the package through Helm on Azure Container Registry. For a comprehensive understanding, you can refer to the documentation. Here are the quick steps for installation: After making all the necessary changes in the deployment folder on the source code, be on that directory to package the code. Log in to your registry to proceed further. 1. Package the Helm chart and login to your container registry: helm package . helm registry login <container-registry-url> --username $USER_NAME --password $PASSWORD 2. Push the Helm Chart to the Registry: helm push dcgm-exporter-3.4.2.tgz oci://<container-registry-url>/helm 3. Verify that the package has been pushed to the registry on Azure portal. 4. Install the chart and verify the installation: helm install dcgm-nvidia oci://<container-registry-url>/helm/dcgm-exporter -n gpu-resources #Check the installation on your AKS cluster by running: helm list -n gpu-resources #Verify the DGCM Exporter: Kubectl get po -n gpu-resources Kubectl get ds -n gpu-resources You can now check that the DGCM Exporter is running on the GPU nodes as a DaemonSet. Exporting GPU Metrics and Configuring Azure Managed Grafana Dashboard Once the DGCM Exporter DaemonSet is running across all GPU node pools, you need to export the GPU metrics generated by this workload to Azure Managed Prometheus. This is accomplished by deploying a PodMonitor resource. Follow these steps: Deploy the PodMonitor: Apply the following YAML configuration to deploy the PodMonitor: apiVersion: azmonitoring.coreos.com/v1 kind: PodMonitor metadata: name: nvidia-dcgm-exporter labels: app.kubernetes.io/name: nvidia-dcgm-exporter spec: selector: matchLabels: app.kubernetes.io/name: nvidia-dcgm-exporter podMetricsEndpoints: - port: metrics interval: 30s podTargetLabels: 2. Check if the PodMonitor is deployed and running by executing: kubectl get podmonitor -n <namespace> 3. Verify Metrics export: Ensure that the metrics are being exported to Azure Managed Prometheus on the portal by navigating to the "Metrics" page on your Azure Monitor Workspace. Create the DGCM Dashboard on Azure Managed Grafana The GitHub repository for the https://github.com/NVIDIA/dcgm-exporter/blob/main/grafana/dcgm-exporter-dashboard.json for the Grafana dashboard. Follow the https://learn.microsoft.com/azure/managed-grafana/how-to-create-dashboard?tabs=azure-portal to import this JSON into your Managed Grafana instance. After importing the JSON, the dashboard displaying GPU metrics will be visible on Grafana.3.9KViews0likes0CommentsAzure Monitor cost optimization using Azure Advisor
Azure Advisor is a free offering that can help you avoid problems and save money by providing you with proactive best practice guidance. We in Azure Monitor are committed to assisting you in optimizing your budget allocation, making informed decisions about monitoring options, and discovering features and configurations that enable you to get more out of their infrastructure.3.3KViews0likes0Comments