azure kubernetes service

186 Topics

Generally Available - Azure Monitor Private Link Scope (AMPLS) Scale Limits Increased by 10x!
Introduction We are excited to announce the General Availability (GA) of Azure Monitor Private Link Scope (AMPLS) scale limit increase, delivering 10x scalability improvements compared to previous limits. This enhancement empowers customers to securely connect more Azure Monitor resources via Private Link, ensuring network isolation, compliance, and Zero Trust alignment for large-scale environments. What is Azure Monitor Private Link Scope (AMPLS)? Azure Monitor Private Link Scope (AMPLS) is a feature that allows you to securely connect Azure Monitor resources to your virtual network using private endpoints. This ensures that your monitoring data is accessed only through authorized private networks, preventing data exfiltration and keeping all traffic inside the Azure backbone network. AMPLS – Scale Limits Increased by 10x in Public Cloud & Sovereign Cloud (Fairfax/Mooncake) - Regions In a groundbreaking development, we are excited to share that the scale limits for Azure Monitor Private Link Scope (AMPLS) have been significantly increased by tenfold (10x) in Public & Sovereign Cloud regions as part of the General Availability! This substantial enhancement empowers our customers to manage their resources more efficiently and securely with private links using AMPLS, ensuring that workload logs are routed via the Microsoft backbone network. What’s New? 10x Scale Increase Connect up to 3,000 Log Analytics workspaces per AMPLS (previously 300) Connect up to 10,000 Application Insights components per AMPLS (previously 1,000) 20x Resource Connectivity Each Azure Monitor resource can now connect to 100 AMPLS resources (previously 5) Enhanced UX/UI Redesigned AMPLS interface supports loading 13,000+ resources with pagination for smooth navigation Private Endpoint Support Each AMPLS object can connect to 10 private endpoints, ensuring secure telemetry flows Why It Matters Top Azure Strategic 500 customers, including major Telecom service providers and Banking & Financial Services organizations, have noted that previous AMPLS limits did not adequately support their increasing requirements. The demand for private links has grown 3–5 times over existing capacity, affecting both network isolation and integration of essential workloads. This General Availability release resolves these issues, providing centralized monitoring at scale while maintaining robust security and performance. Customer Stories Our solution now enables customers to scale their Azure Monitor resources significantly, ensuring seamless network configurations and enhanced performance. Customer B - Case Study: Leading Banking & Financial Services Customer Challenge: The Banking Customer faced complexity in delivering personalized insights due to intricate workflows and content systems. They needed a solution that could scale securely while maintaining compliance and performance for business-critical applications. Solution: The Banking Customer has implemented Microsoft Private Links Services (AMPLS) to enhance the security and performance of financial models for smart finance assistants, leading to greater efficiency and improved client engagement. To ensure secure telemetry flow and compliance, the banking customer implemented Azure Monitor with Private Link Scope (AMPLS) and leveraged the AMPLS Scale Limit Increase feature. Business Impact: Strengthened security posture aligned with Zero Trust principles Improved operational efficiency for monitoring and reporting Delivered a future-ready architecture that scales with evolving compliance and performance demands Customer B - Case Study: Leading Telecom Service Provider - Scaling Secure Monitoring with AMPLS Architecture: A Leading Telecom Service Provider employs a highly micro-segmented design where each DevOps team operates in its own workspace to maximize security and isolation. Challenge: While this design strengthens security, it introduces complexity for large-scale monitoring and reporting due to physical and logical limitations on Azure Monitor Private Link Scope (AMPLS). Previous scale limits made it difficult to centralize telemetry without compromising isolation. Solution: The AMPLS Scale Limit Increase feature enabled the Telecom Service Provider to expand Azure Monitor resources significantly. Monitoring traffic now routes through Microsoft’s backbone network, reducing data exfiltration risks and supporting Zero Trust principles. Impact & Benefits Scalability: Supports up to 3,000 Log Analytics workspaces and 10,000 Application Insights components per AMPLS (10× increase). Efficiency: Each Azure Monitor resource can now connect to 100 AMPLS resources (20× increase). Security: Private connectivity via Microsoft backbone mitigates data exfiltration risks. Operational Excellence: Simplifies configuration for 13K+ Azure Monitor resources, reducing overhead for DevOps teams. Customer Benefits & Results Our solution significantly enhances customers’ ability to manage Azure Monitor resources securely and at scale using Azure Monitor Private Link Scope (AMPLS). Key Benefits Massive Scale Increase 3,000 Log Analytics workspaces (previously 300) 10,000 Application Insights components (previously 1,000) Each AMPLS object can now connect to: Azure Monitor resources can now connect with up to 100 AMPLS resources (20× increase). Broader Resource Support - Supported resource types include: Data Collection Endpoints (DCE) Log Analytics Workspaces (LA WS) Application Insights components (AI) Improved UX/UI Redesigned AMPLS interface supports loading 13,000+ Azure Monitor resources with pagination for smooth navigation. Private Endpoint Connectivity Each AMPLS object can connect to 10 private endpoints, ensuring secure telemetry flows. Resources: Explore the new capabilities of Azure Monitor Private Link Scope (AMPLS) and see how it can transform your network isolation and resource management. Visit our Azure Monitor Private Link Scope (AMPLS) documentation page for more details and start leveraging these enhancements today! For detailed information on configuring Azure Monitor private link scope and azure monitor resources, please refer to the following link: Use Azure Private Link to connect networks to Azure Monitor - Azure Monitor | Microsoft Learn Design your Azure Private Link setup - Azure Monitor | Microsoft Learn Configure your private link - Azure Monitor | Microsoft Learn
Mahesh_Sundaram
Nov 20, 2025 Place Azure Observability Blog
235Views
0likes
0Comments
Azure app platform at Ignite 2025: New innovations for all your apps and agents
Learn more about the latest Azure app platform innovations at Microsoft Ignite 2025 that help you build, modernize, and run all your apps and agents.
NagaSurendran
Nov 18, 2025 Place Apps on Azure Blog
860Views
1like
0Comments
Announcing Advanced Kubernetes Troubleshooting Agent Capabilities (preview) in Azure Copilot
What’s new? Today, we're announcing Kubernetes troubleshooting agent capabilities in Azure Copilot, offering an intuitive, guided agentic experience that helps users detect, triage, and resolve common Kubernetes issues in their AKS clusters. The agent can provide root cause analysis for Kubernetes clusters and resources and is triggered by Kubernetes-specific keywords. It can detect problems like resource failures and scaling bottlenecks and intelligently correlates signals across metrics and events using `kubectl` commands when reasoning and provides actionable solutions. By simplifying complex diagnostics and offering clear next steps, the agent empowers users to troubleshoot independently. How it works With Kubernetes troubleshooting agent, Azure Copilot automatically investigates issues in your cluster by running targeted `kubectl` commands and analyzing your cluster’s configuration and current state. For instance, it identifies failing or pending pods, cluster events, resource utilization metrics, and configuration details to build a complete picture of what’s causing the issue. Azure Copilot then determines the most effective mitigation steps for your specific environment. It provides clear, step-by-step guidance, and in many cases, offers a one-click fix to resolve the issue automatically. If Azure Copilot can’t fully resolve the problem, it can generate a pre-populated support request with all the diagnostic details Microsoft Support needs. You’ll be able to review and confirm everything before the request is submitted. This agent is available via Azure Copilot in the Azure Portal. Learn more about how Azure Copilot works. How to Get Started To start using agents, your global administrator must request access to the agents preview at the tenant level in the Azure Copilot admin center. This confirms your interest in the preview and allows us to enable access. Once approved, users will see the Agent mode toggle in Azure Copilot chat and can then start using Copilot agents. Capacity is limited, so sign up early for the best chance to participate. Additionally, if you are interested in helping shape the future of agentic cloud ops and the role Copilot will play in it, please join our customer feedback program by filling up this form. Agents (preview) in Azure Copilot | Microsoft Learn Troubleshooting sample prompts From an AKS cluster resource, click Kubernetes troubleshooting with Copilot to automatically open Azure Copilot in context of the resource you want to troubleshoot: Try These Prompts to Get Started: Here are a few examples of the kinds of prompts you can use. If you're not already working in the context of a resource, you may need to provide the specific resource that you want to troubleshoot. "My pod keeps restarting can you help me figure out why" "Pods are stuck pending what is blocking them from being scheduled" "I am getting ImagePullBackOff how do I fix this" "One of my nodes is NotReady what is causing it" "My service cannot reach the backend pod what should I check" Note: When using these kinds of prompts, be sure agent mode is enabled by selecting the icon in the chat window: Learn More Troubleshooting agent capabilities in Agents (preview) in Azure Copilot | Microsoft Learn Announcing the CLI Agent for AKS: Agentic AI-powered operations and diagnostics at your fingertips - AKS Engineering Blog Microsoft Copilot in Azure Series - Kubectl | Microsoft Community Hub
Samantha_Fernandez
Nov 18, 2025 Place Apps on Azure Blog
384Views
3likes
0Comments
Building AI apps and agents for the new frontier
Every new wave of applications brings with it the promise of reshaping how we work, build and create. From digitization to web, from cloud to mobile, these shifts have made us all more connected, more engaged and more powerful. The incoming wave of agentic applications, estimated to number 1.3 billion over the next 2 years[1] is no different. But the expectations of these new services are unprecedented, in part for how they will uniquely operate with both intelligence and agency, how they will act on our behalf, integrated as a member of our teams and as a part of our everyday lives. The businesses already achieving the greatest impact from agents are what we call Frontier Organizations. This week at Microsoft Ignite we’re showcasing what the best frontier organizations are delivering, for their employees, for their customers and for their markets. And we’re introducing an incredible slate of innovative services and tools that will help every organization achieve this same frontier transformation. What excites me most is how frontier organizations are applying AI to achieve their greatest level of creativity and problem solving. Beyond incremental increases in efficiency or cost savings, frontier firms use AI to accelerate the pace of innovation, shortening the gap from prototype to production, and continuously refining services to drive market fit. Frontier organizations aren’t just moving faster, they are using AI and agents to operate in novel ways, redefining traditional business processes, evolving traditional roles and using agent fleets to augment and expand their workforce. To do this they build with intent, build for impact and ground services in deep, continuously evolving, context of you, your organization and your market that makes every service, every interaction, hyper personalized, relevant and engaging. Today we’re announcing new capabilities that help you build what was previously impossible. To launch and scale fleets of agents in an open system across models, tools, and knowledge. And to run and operate agents with the confidence that every service is secure, governed and trusted. The question is, how do you get there? How do you build the AI apps and agents fueling the future? Read further for just a few highlights of how Microsoft can help you become frontier: Build with agentic DevOps Perhaps the greatest area of agentic innovation today is in service of developers. Microsoft’s strategy for agentic DevOps is redefining the developer experience to be AI-native, extending the power of AI to every stage of the software lifecycle and integrating AI services into the tools embraced by millions of developers. At Ignite, we’re helping every developer build faster, build with greater quality and security and deliver increasingly innovative apps that will shape their businesses. Across our developer services, AI agents now operate like an active member of your development and operations teams – collaborating, automating, and accelerating every phase of the software development lifecycle. From planning and coding to deployment and production, agents are reshaping how we build. And developers can now orchestrate fleets of agents, assigning tasks to agents to execute code reviews, testing, defect resolution, and even modernization of legacy Java and .NET applications. We continue to take this strategy forward with a new generation of AI-powered tools, with GitHub Agent HQ making coding agents like Codex, Claude Code, and Jules available soon directly in GitHub and Visual Studio Code, to Custom Agents to encode domain expertise, and “bring your own models” to empower teams to adapt and innovate. It’s these advancements that make GitHub Copilot, the world’s the most popular AI pair programmer, serving over 26 million users and helping organizations like Pantone, Ahold Delhaize USA, and Commerzbank streamline processes and save time. Within Microsoft’s own developer teams, we’re seeing transformative results with agentic DevOps. GitHub Copilot coding agent is now a top contributor—not only to GitHub’s core application but also to our major open-source projects like the Microsoft Agent Framework and Aspire. Copilot is reducing task completion time from hours to minutes and eliminating up to two weeks of manual development effort for complex work. Across Microsoft, 90% of pull requests are now covered by GitHub Copilot code review, increasing the pace of PR completion. Our AI-powered assistant for Microsoft’s engineering ecosystem is deeply integrated into VS Code, Teams, and other tools, giving engineers and product managers real-time, context-aware answers where they work—saving 2.2k developer days in September alone. For app modernization, GitHub Copilot has reduced modernization project timelines by as much as 88%. In production environments, Azure SRE agent has handled over 7K incidents and collected diagnostics on over 18K incidents, saving over 10,000 hours for on-call engineers. These results underscore how agentic workflows are redefining speed, scale, and reliability across the software lifecycle at Microsoft. Launch at speed and scale with a full-stack AI app and agent platform We’re making it easier to build, run, and scale AI agents that deliver real business outcomes. To accelerate the path to production for advanced AI applications and agents is delivering a complete, and flexible foundation that helps every organization move with speed and intelligence without compromising security, governance or operations. Microsoft Foundry helps organizations move from experimentation to execution at scale, providing the organization-wide observability and control that production AI requires. More than 80,000 customers, including 80% of the Fortune 500, use Microsoft Foundry to build, optimize, and govern AI apps and agents today. Foundry supports open frameworks like the Microsoft Agent Framework for orchestration, standard protocols like Model Context Protocol (MCP) for tool calling, and expansive integrations that enable context-aware, action-oriented agents. Companies like Nasdaq, Softbank, Sierra AI, and Blue Yonder are shipping innovative solutions with speed and precision. New at Ignite this year: Foundry Models With more than 11,000 models like OpenAI’s GPT-5, Anthropic’s Claude, and Microsoft’s Phi at their fingertips, developers, Foundry delivers the broadest model selection on any cloud. Developers have the power to benchmark, compare, and dynamically route models to optimize performance for every task. Model router is now generally available in Microsoft Foundry and in public preview in Foundry Agent Service. Foundry IQ, Delivering the deep context needed to make every agent grounded, productive, and reliable. Foundry IQ, now available in public preview, reimagines retrieval-augmented generation (RAG) as a dynamic reasoning process rather than a one-time lookup. Powered by Azure AI Search, it centralizes RAG workflows into a single grounding API, simplifying orchestration and improving response quality while respecting user permissions and data classifications. Foundry Agent Service now offers Hosted Agents, multi-agent workflows, built-in memory, and the ability to deploy agents directly to Microsoft 365 and Agent 365 in public preview. Foundry Tools, empowers developers to create agents with secure, real-time access to business systems, business logic, and multimodal capabilities. Developers can quickly enrich agents with real-time business context, multimodal capabilities, and custom business logic through secure, governed integration with 1,400+ systems and APIs. Foundry Control Plane, now in public preview, centralizes identity, policy, observability, and security signals and capabilities for AI developers in one portal. Build on an AI-Ready foundation for all applications Managed Instance on Azure App Service lets organizations migrate existing .NET web applications to the cloud without the cost or effort of rewriting code, allowing them to migrate directly into a fully managed platform-as-a-service (PaaS) environment. With Managed Instance, organizations can keep operating applications with critical dependencies on local Windows services, third-party vendor libraries, and custom runtimes without requiring any code changes. The result is faster modernizations with lower overhead, and access to cloud-native scalability, built-in security and Azure’s AI capabilities. MCP Governance with Azure API Management now delivers a unified control plane for APIs and MCP servers, enabling enterprises to extend their existing API investments directly into the agentic ecosystem with trusted governance, secure access, and full observability. Agent Loop and native AI integrations in Azure Logic Apps enable customers to move beyond rigid workflows to intelligent, adaptive automation that saves time and reduces complexity. These capabilities make it easier to build AI-powered, context-aware applications using low-code tools, accelerating innovation without heavy development effort. Azure Functions now supports hosting production-ready, reliable AI agents with stateful sessions, durable tool calls, and deterministic multi-agent orchestrations through the durable extension for Microsoft Agent Framework. Developers gain automatic session management, built-in HTTP endpoints, and elastic scaling from zero to thousands of instances — all with pay-per-use pricing and automated infrastructure. Azure Container Apps agents and security supercharges agentic workloads with automated deployment of multi-container agents, on-demand dynamic execution environments, and built-in security for runtime protection, and data confidentiality. Run and operate agents with confidence New at Ignite, we’re also expanding the use of agents to keep every application secure, managed and operating without compromise. Expanded agentic capabilities protect applications from code to cloud and continuously monitor and remediate production issues, while minimizing the efforts on developers, operators and security teams. Microsoft Defender for Cloud and GitHub Advanced Security: With the rise of multi-agent systems, the security threat surface continues to expand. Increased alert volumes, unprioritized threat signals, unresolved threats and a growing backlog of vulnerabilities is increasing risk for businesses while security teams and developers often operate in disconnected tools, making collaboration and remediation even more challenging. The new Defender for Cloud and GitHub Advanced Security integration closes this gap, connecting runtime context to code for faster alert prioritization and AI-powered remediation. Runtime context prioritizes security risks with insights that allow teams to focus on what matters most and fix issues faster with AI-powered remediation. When Defender for Cloud finds a threat exposed in production, it can now link to the exact code in GitHub. Developers receive AI suggested fixes directly inside GitHub, while security teams track progress in Defender for Cloud in real time. This gives both sides a faster, more connected way to identify issues, drive remediation, and keep AI systems secure throughout the app lifecycle. Azure SRE Agent is an always-on, AI-powered partner for cloud reliability, enabling production environments to become self-healing, proactively resolve issues, and optimize performance. Seamlessly integrated with Azure Monitor, GitHub Copilot, and incident management tools, Azure SRE Agent reduces operational toil. The latest update introduces no-code automation, empowering teams to tailor processes to their unique environments with minimal engineering overhead. Event-driven triggers enable proactive checks and faster incident response, helping minimize downtime. Expanded observability across Azure and third-party sources is designed to help teams troubleshoot production issues more efficiently, while orchestration capabilities support integration with MCP-compatible tools for comprehensive process automation. Finally, its adaptive memory system is designed to learn from interactions, helping improve incident handling and reduce operational toil, so organizations can achieve greater reliability and cost efficiency. The future is yours to build We are living in an extraordinary time, and across Microsoft we’re focused on helping every organization shape their future with AI. Today’s announcements are a big step forward on this journey. Whether you’re a startup fostering the next great concept or a global enterprise shaping your future, we can help you deliver on this vision. The frontier is open. Let’s build beyond expectations and build the future! Check out all the learning at Microsoft Ignite on-demand and read more about the announcements making it happen at: Recommended sessions BRK113: Connected, managed, and complete BRK103: Modernize your apps in days, not months, with GitHub Copilot BRK110: Build AI Apps fast with GitHub and Microsoft Foundry in action BRK100: Best practices to modernize your apps and databases at scale BRK114: AI Agent architectures, pitfalls and real-world business impact BRK115: Inside Microsoft's AI transformation across the software lifecycle Announcements aka.ms/AgentFactory aka.ms/AppModernizationBlog aka.ms/SecureCodetoCloudBlog aka.ms/AppPlatformBlog [1] IDC Info Snapshot, sponsored by Microsoft, 1.3 Billion AI Agents by 2028, #US53361825 and May 2025.
Mike_Hulme
Nov 18, 2025 Place Apps on Azure Blog
793Views
0likes
0Comments
Announcing General Availability: Azure Monitor dashboards with Grafana
Continuing our commitment to open-source solutions, we are announcing the general availability of Azure Monitor dashboards with Grafana. This service offers a powerful solution for cloud-native monitoring and visualizing all your Azure data. Dashboards with Grafana enable you to create and edit Grafana dashboards directly in the Azure portal without additional cost and less administrative overhead compared to self-hosting Grafana or using managed Grafana services. Built-in Grafana controls and components allow you to apply a rich set of visualization panels and client-side transformations to Azure monitoring data to create custom dashboards. Start quickly with pre-built and community dashboards Dozens of pre-built Grafana dashboards for Azure Kubernetes Services, Application Insights, Storage Accounts, Cosmos DB, Azure PostgreSQL, OpenTelemetry metrics and dozens of other Azure resources are included and enabled by default. Additionally, you can import dashboards from thousands of publicly available Grafana community and open-source dashboards for the supported data sources: Prometheus, Azure Monitor (metrics, logs, traces, Azure Resource Graph), and Azure Data Explorer. Streamline monitoring with open-source compatibility and Azure enterprise capabilities Azure Monitor dashboards with Grafana are fully compatible with open-source Grafana dashboards and are portable across any Grafana instances regardless of where they are hosted. Furthermore, dashboards are native Azure resources supporting Azure RBAC to assign permissions, and automation via ARM and Bicep templates. Import, edit and create dashboards in 30+ Azure regions Choose from any language in the Azure Portal for your Grafana user interface Manage dashboard content as part of the ARM resource Automatically generate ARM templates to automate deployment and manage dashboards Take advantage of Grafana Explore and New Dashboards Leverage Grafana Explore to quickly create ad-hoc queries without modifying dashboards and add queries and visualizations to new or existing dashboards New out of the box dashboards for additional Azure resources: Additional Azure Kubernetes Service support including AKS Automatic and AKS Arc connected clusters Azure Container Apps monitoring dashboards Microsoft Foundry monitoring dashboards Azure Monitor Application Insights dashboards OpenTelemetry metrics Microsoft Agent Framework High Performance Computing dashboards with dedicated GPU monitoring When to step up to Azure Managed Grafana? If you store your telemetry data in Azure, Dashboards with Grafana in the Azure portal is a great way to get started with Grafana. If you have additional 3rd-party data sources, or need full enterprise capabilities in Grafana, you can choose to upgrade to Azure Managed Grafana, a fully managed hosted service for the Grafana Enterprise software. See a detailed solution comparison of Dashboards with Grafana and Azure Managed Grafana here. Get started with Azure Monitor dashboards with Grafana today.
KayodePrince
Nov 18, 2025 Place Azure Observability Blog
604Views
3likes
0Comments
Simplify Application Monitoring for AKS with Azure Monitor (Public Preview)
As cloud-native workloads scale, customers increasingly expect application and infrastructure observability to be unified, automated, and devops-friendly. Azure Monitor is advancing this vision with Application Monitoring for Azure Kubernetes Service (AKS). With seamless onboarding and troubleshooting experiences in the Azure Portal, now in Public Preview. This new capability brings first-class OpenTelemetry support, seamless onboarding from the AKS cluster blade, and auto-instrumentation and auto-configuration options that make it easier than ever to collect application performance data into Azure Monitor and Application Insights—without modifying application code or maintaining custom agents. Enable application monitoring for your AKS deployed apps directly from the Azure Portal in two steps: 1. Enable application monitoring for the AKS cluster in Monitor Settings 2. Choose the namespaces for application monitoring and configure namespace-wide onboarding to route application signals to an App Insights resource. Optionally, leverage Custom Resource Definitions (CRDs) for more granular enablement and per-deployment onboarding. Feature Highlights Auto-instrumentation Auto-instrument Java and NodeJS applications without code changes. This approach instruments workloads with the AzureMonitor OpenTelemetry distro and routes telemetry to Application Insights. Now available in both CLI and Azure portal for addon enablement and namespace configuration. Unified Monitoring and Troubleshooting Switch seamlessly between infrastructure and application layers with improved navigation between Container Insights and Application Insights, curated OpenTelemetry workbooks, and Azure-curated Grafana dashboards. When looking into your deployment controllers from Container Insights, you can also see the application performance metrics alongside to identify problematic requests or failures. From there, you can seamlessly transition over to your Application Insights to get a more detailed diagnosis. View your application performance next to your infrastructure metrics in Container Insights Full-Stack Dashboards with Grafana This new application monitoring capability becomes even more powerful when paired with Dashboards with Grafana for Azure Monitor. With curated, Azure-hosted Grafana dashboards built specifically for Application Insights and OpenTelemetry data, teams can extend their AKS application monitoring experience with rich, full-stack visualizations tailored for cloud-native workloads. Application monitoring dashboards available through Dashboards with Grafana These dashboards allow you to: Bring application traces, requests, dependencies, and exception data from Application Insights into Grafana dashboards optimized for app-centric troubleshooting. Correlate application performance with AKS infrastructure metrics, including node, pod, and container health, to rapidly identify cross-layer issues. Visualize OpenTelemetry signals flowing through Azure Monitor in a unified, standards-based format without needing to build dashboards from scratch. Customize and extend dashboards with your own OTel metrics or additional Application Insights dimensions for deeper app performance analytics. By combining Application Monitoring for AKS with Dashboards for Grafana, developers and operators gain a complete, end-to-end view of application behavior, making it faster and easier to diagnose issues, validate deployments, and understand the health of microservices running on AKS. Call to Action Start simplifying application observability today with Azure Monitor for AKS. Unify your metrics, logs, and traces in a single monitoring experience powered by OpenTelemetry and Azure Monitor. Explore the documentation and get started: https://learn.microsoft.com/azure/azure-monitor/app/kubernetes-codeless Learn more about our new features for OpenTelemetry in Azure Monitor: https://aka.ms/igniteotelblog
austonli
Nov 18, 2025 Place Azure Observability Blog
223Views
1like
0Comments
Troubleshoot with OTLP signals in Azure Monitor (Limited Public Preview)
As organizations increasingly rely on distributed cloud-native applications, the need for comprehensive standards-based observability has never been greater. OpenTelemetry (OTel) has emerged as the industry standard for collecting and transmitting telemetry data, enabling unified monitoring across diverse platforms and services. Microsoft is among the top contributors to OpenTelemetry. Azure Monitor is expanding its support for the OTel standard with this preview, empowering developers and operations teams to seamlessly capture, analyze, and act on critical signals from their applications and infrastructure. With this limited preview (sign-up here), regardless of where your applications are running, you can channel the OpenTelemetry Protocol (OTLP) logs, metrics and traces to Azure Monitor directly. On Azure compute platforms, we have simpler collection orchestration that also unifies application and infrastructure telemetry collection with the Azure Monitor collection offerings for VM/VMSS or AKS. On Azure VMs/VMSS (or any Azure Arc supported compute), you can use the Azure Monitor Agent (AMA) that you are already using to collect infrastructure logs. On AKS, the Azure Monitor add-ons that orchestrate Container Insights and managed Prometheus, will also auto configure the collection of OTLP signals from your applications (or auto-instrument with Azure Monitor OTel Distro for supported languages). On these platforms or anywhere else, you can choose to use OpenTelemetry Collector, and channel the OTLP signals from your OTel SDK instrumented application directly to Azure Monitor cloud ingestion endpoints. OTLP metrics will be stored in Azure Monitor Workspace, a Prometheus metrics store. Logs and traces will be stored in Azure Monitor Log Analytics Workspace in an OTel semantic conventions-based schema. Application Insights experiences will light up, enabling all distributed tracing and troubleshooting experiences powered by Azure Monitor, as well as out of the box Dashboards with Grafana from the community. With this preview, we are also extending the support for auto-instrumentation of applications on AKS to .NET and Python applications and introducing OTLP metrics collection from all auto-instrumented applications (Java/Node/.NET/Python). Sign-up for the preview here: https://aka.ms/azuremonitorotelpreview.
SoubhagyaDash
Nov 18, 2025 Place Azure Observability Blog
374Views
1like
0Comments
Azure Copilot observability agent: Intelligent Investigations Across Your Azure Stack
Cloud operations require more than reactive troubleshooting; they demand intelligent observability that scales across resources and interfaces and provides actionable insights when services are not operating as expected. We are introducing the Azure Copilot observability agent that materializes this promise. Azure Copilot observability agent extends and builds on top of what was previously known was the Azure Monitor investigation capability and introduces a slick experience, combining the power of agentic investigations with expanded capabilities for deeper visibility and faster resolution. Smarter insights, faster recovery, deeper visibility across your Azure stack. What it is The Azure Copilot observability agent works within your Azure workflows to make troubleshooting faster and smarter. It helps you: Automatically isolate problems in complex applications across the stack Detect and correlate anomalies from metrics, logs and other observability signals to help identify cause of an issue Correlate data from multiple sources for full context. Generate actionable findings and next steps described in clear human language. Preserve results for collaboration and tracking. Integrated with alerts, the Azure portal, and Azure Copilot (gated preview), the Azure Copilot observability agent ensures investigations are seamless and actionable. How it works When you get an alert and need to investigate it quickly and take action, simply click on the ‘Investigate’ button. Next, you’ll see a list of AI-generated findings to select from. Each finding suggests possible causes behind what went wrong and offers a starting point for troubleshooting. In order to get a better understanding of the summary, you can easily access the supporting Data. Behind the scenes, the observability agent uses the power of AI, Machine learning models for anomaly detection and correlation, and large language models (LLMs) to deliver these insights. Expanded intelligence for critical resources The Azure Copilot observability agent now delivers intelligent, AI-driven investigations across your Azure stack, from application services down to the underlying infrastructure. It automatically scopes from the resource to dependent components and infrastructure layers, correlating metrics, logs, and health signals for deeper visibility and faster root cause analysis. This includes support across a customer’s application services and critical Azure resources such as Virtual Machines (VM), Azure Kubernetes Service (AKS) clusters, and more, providing true full-stack coverage for complex environments. For these environments, investigations leverage multiple analysis types to deliver deeper insights: Metric analysis - detect abnormal CPU, memory, or network utilization patterns in VMs and AKS nodes, helping identify resource pressure before it impacts workloads. Recent alerts correlation - when a spike in AKS pod restarts occurs, the observability agent correlates with recent alerts to highlight cascading issues across cluster components. Resource health checks - surface health signals for VMs and AKS nodes alongside anomaly findings, enabling operators to validate whether infrastructure degradation is contributing to application instability. Resource diagnostics tools integration - findings are automatically connected to built-in Azure diagnostics for quick validation and remediation steps without leaving the investigation workflow Log-based metric analysis - for AKS and VM environments, enrich metric anomaly detection with contextual tags and data derived from logs, enabling more precise root cause identification. Extended regional availability The Azure Copilot observability agent is now supported in most Azure regions, so you can leverage its capabilities wherever your workloads run Copilot support With Copilot, you can instantly interact with your alerts in a natural way. Just ask questions like ‘Show me my critical alerts’ or ‘Which alerts need my attention?’ Copilot will surface a clear list of alerts for you. From there, simply click an alert to view its details and access the Investigate button -your gateway to the Azure Copilot observability agent. With one click, you can dive deeper, uncover potential root causes, and get actionable insights to resolve issues faster. Looking ahead The Azure Copilot observability agent is evolving toward a broader role in your observability strategy. While today it focuses on investigations, we have an exciting roadmap to make investigations even smarter and more actionable. Future releases will also expand into advanced scenarios, such as correlating issues and managing monitoring configurations without adding complexity. Start using the Azure Copilot observability agent today Available in preview, the Azure Copilot observability agent is integrated into your existing Azure workflows. Access it from alerts, the Azure portal, or Azure Copilot (gated preview) and experience a smarter way to resolve issues. Learn more: documentation for full details on capabilities and setup. We’re committed to evolving the observability agent based on your feedback. Share your thoughts via azmoninvestigation@microsoft.com or through the Give Feedback form in the experience. Don’t Miss What’s Next Ignite Session: Unlock cloud-scale observability and optimization with Azure December Webinar: Updates, best practices, and live Q&A, 👉 to secure your spot! NEW Deep Preview! In parallel with this preview, we are starting a preview of new exciting investigation capabilities, enabling deeper and more precise investigation insights. We have enabled Azure Copilot observability agent with deep agentic reasoning, also enabling dialog with the developer in natural language, enabling deep, interactive investigation of the issues. Click here to sign up for preview.
EfratNauerman
Nov 18, 2025 Place Azure Observability Blog
472Views
0likes
0Comments
Introducing Monitoring Coverage: Assess and Improve Your Monitoring Posture at Scale
As organizations grow their Azure footprint, ensuring consistent monitoring coverage across resources becomes increasingly important. The new Monitoring Coverage (preview) feature in Azure Monitor provides a single, centralized experience to assess, configure at-scale, and enhance monitoring across your environment with ease. A unified view of your monitoring health Monitoring Coverage consolidates insights from Azure Advisor to highlight where monitoring can be improved. You can see which Azure resources already have basic out-of-box telemetry enabled and which could benefit from additional recommended settings, helping you close gaps in your observability strategy at scale. Key capabilities Comprehensive visibility: Get an overview of monitoring coverage across common Azure resource types. Actionable recommendations: Identify and apply Azure Advisor recommendations at-scale to strengthen your monitoring posture. Centralized configuration: Enable recommended monitoring settings for multiple resources from a single pane of glass. Detailed resource insights: Explore individual resource details to review active monitoring configurations and applicable recommendations. How to access In the Azure portal, open Azure Monitor. Under the Settings section of the left navigation, select Monitoring Coverage (preview). You can scope the view using standard Azure filters; Subscriptions, Resource groups, Tags, Locations, and Resource types, allowing you to focus on the resources you manage. Supported resource types During preview, Monitoring Coverage supports Virtual Machines (VMs) and Azure Kubernetes Service (AKS) clusters. Support for additional Azure services will roll out in future updates. Overview tab The Overview tab provides a snapshot of your overall monitoring landscape, showing which resources have: Basic monitoring: Default metrics and logs enabled upon creation. Enhanced monitoring: Microsoft-recommended configurations for deeper insights and improved observability. This view makes it easy to identify coverage gaps and take quick action to enable enhanced monitoring, which may incur additional cost depending on your configuration. Streamlined enablement experience When you choose to enable monitoring: The Enablement screen lists all resources included in the operation. You can deselect specific resources if needed. Selecting View details and configure allows customization by resource type—for example, selecting a Log Analytics workspace. The Review and Enable tab summarizes all changes before application. Once enabled, data typically begins flowing to the designated workspace within 30–60 minutes. During this preview, you can enable monitoring for up to 100 resources at a time, and an existing Log Analytics workspace or Azure Monitor Workspace is required. Monitoring Details page For a deeper look, the Monitoring Details page lets you: View resources as a list or group them by recommendation. Filter using standard Azure filters. See the Monitoring coverage column summarizing enabled recommendations and data collection rules. Enable individual monitoring settings directly from this view when managing resources one at a time. Share your feedback We’re actively evolving Monitoring Coverage based on user input. To share your feedback or suggest new capabilities, use the Feedback link at the top of the page in the Azure portal. Your insights will help shape the future of Azure Monitor. Try Monitoring Coverage (preview) today in the Azure portal to assess your observability coverage and take the next step toward proactive, consistent monitoring across your Azure environment.
Nathan_Mangum
Nov 18, 2025 Place Azure Observability Blog
228Views
2likes
0Comments
Reimagining AI Ops with Azure SRE Agent: New Automation, Integration, and Extensibility features
Azure SRE Agent offers intelligent and context aware automation for IT operations. Enhanced by customer feedback from our preview, the SRE Agent has evolved into an extensible platform to automate and manage tasks across Azure and other environments. Built on an Agentic DevOps approach - drawing from proven practices in internal Azure operations - the Azure SRE Agent has already saved over 20,000 engineering hours across Microsoft product teams operations, delivering strong ROI for teams seeking sustainable AIOps. An Operations Agent that adapts to your playbooks Azure SRE Agent is an AI powered operations automation platform that empowers SREs, DevOps, IT operations, and support teams to automate tasks such as incident response, customer support, and developer operations from a single, extensible agent. Its value proposition and capabilities have evolved beyond diagnosis and mitigation of Azure issues, to automating operational workflows and seamless integration with the standards and processes used in your organization. SRE Agent is designed to automate operational work and reduce toil, enabling developers and operators to focus on high-value tasks. By streamlining repetitive and complex processes, SRE Agent accelerates innovation and improves reliability across cloud and hybrid environments. In this article, we will look at what’s new and what has changed since the last update. What’s New: Automation, Integration, and Extensibility Azure SRE Agent just got a major upgrade. From no-code automation to seamless integrations and expanded data connectivity, here’s what’s new in this release: No-code Sub-Agent Builder: Rapidly create custom automations without writing code. Flexible, event-driven triggers: Instantly respond to incidents and operational changes. Expanded data connectivity: Unify diagnostics and troubleshooting across more data sources. Custom actions: Integrate with your existing tools and orchestrate end-to-end workflows via MCP. Prebuilt operational scenarios: Accelerate deployment and improve reliability out of the box. Unlike generic agent platforms, Azure SRE Agent comes with deep integrations, prebuilt tools, and frameworks specifically for IT, DevOps, and SRE workflows. This means you can automate complex operational tasks faster and more reliably, tailored to your organization’s needs. Sub-Agent Builder: Custom Automation, No Code Required Empower teams to automate repetitive operational tasks without coding expertise, dramatically reducing manual workload and development cycles. This feature helps address the need for targeted automation, letting teams solve specific operational pain points without relying on one-size-fits-all solutions. Modular Sub-Agents: Easily create custom sub-agents tailored to your team’s needs. Each sub-agent can have its own instructions, triggers, and toolsets, letting you automate everything from outage response to customer email triage. Prebuilt System Tools: Eliminate the inefficiency of creating basic automation from scratch, and choose from a rich library of hundreds of built-in tools for Azure operations, code analysis, deployment management, diagnostics, and more. Custom Logic: Align automation to your unique business processes by defining your automation logic and prompts, teaching the agent to act exactly as your workflow requires. Flexible Triggers: Automate on Your Terms Invoke the agent to respond automatically to mission-critical events, not wait for manual commands. This feature helps speed up incident response and eliminate missed opportunities for efficiency. Multi-Source Triggers: Go beyond chat-based interactions, and trigger the agent to automatically respond to Incident Management and Ticketing systems like PagerDuty and ServiceNow, Observability Alerting systems like Azure Monitor Alerts, or even on a cron-based schedule for proactive monitoring and best-practices checks. Additional trigger sources such as GitHub issues, Azure DevOps pipelines, email, etc. will be added over time. This means automation can start exactly when and where you need it. Event-Driven Operations: Integrate with your CI/CD, monitoring, or support systems to launch automations in response to real-world events - like deployments, incidents, or customer requests. Vital for reducing downtime, it ensures that business-critical actions happen automatically and promptly. Expanded Data Connectivity: Unified Observability and Troubleshooting Integrate data, enabling comprehensive diagnostics and troubleshooting and faster, more informed decision-making by eliminating silos and speeding up issue resolution. Multiple Data Sources: The agent can now read data from Azure Monitor, Log Analytics, and Application Insights based on its Azure role-based access control (RBAC). Additional observability data sources such as Dynatrace, New Relic, Datadog, and more can be added via the Remote Model Context Protocol (MCP) servers for these tools. This gives you a unified view for diagnostics and automation. Knowledge Integration: Rather than manually detailing every instruction in your prompt, you can upload your Troubleshooting Guide (TSG) or Runbook directly, allowing the agent to automatically create an execution plan from the file. You may also connect the agent to resources like SharePoint, Jira, or documentation repositories through Remote MCP servers, enabling it to retrieve needed files on its own. This approach utilizes your organization’s existing knowledge base, streamlining onboarding and enhancing consistency in managing incidents. Azure SRE Agent is also building multi-agent collaboration by integrating with PagerDuty and Neubird, enabling advanced, cross-platform incident management and reliability across diverse environments. Custom Actions: Automate Anything, Anywhere Extend automation beyond Azure and integrate with any tool or workflow, solving the problem of limited automation scope and enabling end-to-end process orchestration. Out-of-the-Box Actions: Instantly automate common tasks like running azcli, kubectl, creating GitHub issues, or updating Azure resources, reducing setup time and operational overhead. Communication Notifications: The SRE Agent now features built-in connectors for Outlook, enabling automated email notifications, and for Microsoft Teams, allowing it to post messages directly to Teams channels for streamlined communication. Bring Your Own Actions: Drop in your own Remote MCP servers to extend the agent’s capabilities to any custom tool or workflow. Future-proof your agentic DevOps by automating proprietary or emerging processes with confidence. Prebuilt Operations Scenarios Address common operational challenges out of the box, saving teams time and effort while improving reliability and customer satisfaction. Incident Response: Minimize business impact and reduce operational risk by automating detection, diagnosis, and mitigation of your workload stack. The agent has built-in runbooks for common issues related to many Azure resource types including Azure Kubernetes Service (AKS), Azure Container Apps (ACA), Azure App Service, Azure Logic Apps, Azure Database for PostgreSQL, Azure CosmosDB, Azure VMs, etc. Support for additional resource types is being added continually, please see product documentation for the latest information. Root Cause Analysis & IaC Drift Detection: Instantly pinpoint incident causes with AI-driven root cause analysis including automated source code scanning via GitHub and Azure DevOps integration. Proactively detect and resolve infrastructure drift by comparing live cloud environments against source-controlled IaC, ensuring configuration consistency and compliance. Handle Complex Investigations: Enable the deep investigation mode that uses a hypothesis-driven method to analyze possible root causes. It collects logs and metrics, tests hypotheses with iterative checks, and documents findings. The process delivers a clear summary and actionable steps to help teams accurately resolve critical issues. Incident Analysis: The integrated dashboard offers a comprehensive overview of all incidents managed by the SRE Agent. It presents essential metrics, including the number of incidents reviewed, assisted, and mitigated by the agent, as well as those awaiting human intervention. Users can leverage aggregated visualizations and AI-generated root cause analyses to gain insights into incident processing, identify trends, enhance response strategies, and detect areas for improvement in incident management. Inbuilt Agent Memory: The new SRE Agent Memory System transforms incident response by institutionalizing the expertise of top SREs - capturing, indexing, and reusing critical knowledge from past incidents, investigations, and user guidance. Benefit from faster, more accurate troubleshooting, as the agent learns from both successes and mistakes, surfacing relevant insights, runbooks, and mitigation strategies exactly when needed. This system leverages advanced retrieval techniques and a domain-aware schema to ensure every on-call engagement is smarter than the last, reducing mean time to resolution (MTTR) and minimizing repeated toil. Automatically gain a continuously improving agent that remembers what works, avoids past pitfalls, and delivers actionable guidance tailored to the environment. GitHub Copilot and Azure DevOps Integration: Automatically triage, respond to, and resolve issues raised in GitHub or Azure DevOps. Integration with modern development platforms such as GitHub Copilot coding agent increases efficiency and ensures that issues are resolved faster, reducing bottlenecks in the development lifecycle. Ready to get started? Azure SRE Agent home page Product overview Pricing Page Pricing Calculator Pricing Blog Demo recordings Deployment samples What’s Next? Give us feedback: Your feedback is critical - You can Thumbs Up / Thumbs Down each interaction or thread, or go to the “Give Feedback” button in the agent to give us in-product feedback - or you can create issues or just share your thoughts in our GitHub repo at https://github.com/microsoft/sre-agent. We’re just getting started. In the coming months, expect even more prebuilt integrations, expanded data sources, and new automation scenarios. We anticipate continuous growth and improvement throughout our agentic AI platforms and services to effectively address customer needs and preferences. Let us know what Ops toil you want to automate next!
vyomnagrani
Nov 18, 2025 Place Apps on Azure Blog
1.3KViews
0likes
0Comments