azure monitor

1353 Topics

Export historical data from Log Analytics workspace with Export Job (preview)
Log Analytics Export Job is now available in public preview. It gives you a straightforward way to export historical log data from your workspace to Azure Blob Storage, without writing custom scripts or disrupting live operations. You submit a job including a query, time range, and the service handles the rest asynchronously. Historical data had no built-in exit path Your Log Analytics workspace accumulates months, sometimes years, of telemetry. That data has real value beyond the workspace: training security models, satisfying compliance requirements, supporting forensic investigations with external tools, or migrating to a new analytics platform. The challenge has always been getting it out. Log Analytics supports continuous data export for ongoing ingestion, but that doesn’t help with data that already exists. Teams that needed to export historical data had to build their own solutions: scripted query loops, Logic Apps, or Azure Functions calling the query API in batches and stitching results into storage. These approaches were slow, brittle, and hard to operationalize at scale. Export Job closes that gap. One job per table, across Analytics and Basic tiers You target a specific table, define a KQL filter on table, set a time range, and the job exports that data, whether it sits in Analytics or Basic tier, writing the results directly to your storage account as Parquet files. End-to-end flow of a Log Analytics Export Job You can filter with KQL to scope the export to exactly the columns and records you need, reducing cost and downstream processing time. Output is gzip-compressed Parquet, the standard columnar format for data lakes, Spark, Azure Data Explorer, and most ML frameworks, with no conversion step required. Export data in hourly folders to your blob storage. Billing is based on two existing meters: data scanned, using existing Log Analytics scan rates, and data volume exported as measured in your storage account. Resilient execution Large exports can be interrupted by network issues, transient storage errors, or downtime. Export Job includes a built-in retry mechanism to overcome these interruptions automatically. The service splits the job into hourly bins, each tracked and written independently to your storage container. Transient failures are retried without any action on your part. If a bin fails after retry exhaustion or job 7-days' timeout, you can retry it manually within 7 days of job completion, without restarting the entire job or re-exporting data that already completed successfully. Before a retry writes new data, any partial output from the failed bin is automatically cleaned up, so there is no risk of duplicates in your storage account. Getting started Log Analytics Export Job is available in public preview today. Configuration is programmatic through the Azure Monitor REST API, letting you create, check status, cancel, and retry jobs. Before your first job: Enable the workspace Managed Identity in your Log Analytics workspace settings. Assign the Storage Blob Data Contributor and Log Analytics Reader roles to the workspace Managed Identity on your destination storage account. Ensure the destination storage account is in the same Azure region as the workspace (cross-region support is on the roadmap). Enable the Jobs category in your workspace’s diagnostic settings, to route job execution records to the LAJobLogs table. This gives you creation time, job parameters, and bin-level status for every job you run. Assess that export volume and run duration using suggested query in export job article. Consider export job bounderies: The maximum time range per job is one year The maximum run duration per job is seven days. When reached due to volume, you can retry to continue export from where it stopped. Five concurrent jobs are supported Once prerequisites are in place, create a job with a single API call: POST https://api.loganalytics.azure.com/v2/subscriptions/{subscriptionId}/resourcegroups/{resourcegroup}/providers/Microsoft.OperationalInsights/workspaces/{workspace}/jobs/export?api-version=2023-09-01-preview Authorization: {credential} content-type: application/json { "startTime": "2025-01-01T00:00:00Z", "endTime": "2025-06-30T23:59:59Z", "query": "{query}", "destinationStorageAccounts": [ "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}" ], "containerName": "{containerName}", "outputDataFormat": "Parquet", "dateTimeFormat": "yyyy-MM-ddTHH" } Copy the job ID returned in the response, which can be used to poll status, cancel, or retry individual failed bins. Learn more: https://aka.ms/LogsExportJob Share your feedback as we continue to improve the feature.
YossiY
Jul 07, 2026 Place Azure Observability Blog
47Views
0likes
0Comments
Find anomalies in Prometheus and OpenTelemetry metrics with Dynamic Thresholds (Preview)
Dynamic thresholds are extended to query-based metric alerts in Azure Monitor, allowing to detect and alert on anomalies in Azure Monitor managed Prometheus metrics and OpenTelemetry metrics stored in an Azure Monitor Workspace. This follows the introduction of Dynamic Thresholds for Log search alerts — Azure Monitor now offers consistent Dynamic Thresholds support across logs and metrics — platform metrics, log search queries, and now query-based metric alerts. A consistent anomaly-detection approach, wherever your signals live. Dynamic thresholds are not a single static formula. They apply a range of machine-learning models and algorithms to historical query results, learn each series’ normal rhythm — including hourly, daily, and weekly seasonality — and automatically fit the most appropriate baseline separately to every time series. This way, a single alert rule can monitor many resources or dimensions while each one gets its own independent, self-refining baseline. Why Dynamic Thresholds Matter Simpler configuration: Reduce the need to define, maintain, and continuously tune static thresholds inside PromQL alert logic. Adaptive monitoring: Let alert thresholds adjust to changing workload behavior, recurring traffic peaks, and seasonal usage patterns. At-scale intelligence: Monitor multiple time series with a single alert rule, while Azure Monitor learns an independent baseline for each resource or dimension combination. Example 1 — Spot CPU anomalies in AKS workloads Scenario: Monitor container CPU utilization across pods or deployments in AKS with a query-based metric alert built on Prometheus metrics. Example query: sum by (microsoft_resource_id, namespace, deployment, container) (rate(container_cpu_usage_seconds_total[5m])) / sum by (microsoft_resource_id, namespace, deployment, container) (container_spec_cpu_quota / container_spec_cpu_period) Why dynamic thresholds help: CPU usage of a Kubernetes workload changes with workload mix, deployment timing, scaling activity, and traffic patterns. Static thresholds can be difficult to tune across namespaces, deployments, and containers. Dynamic thresholds learn a separate baseline for each monitored time series — in this example, for every pod, deployment, and container combination — so genuine CPU spikes stand out while expected variation from autoscaling and traffic mix stays quiet. Example 2 — Catch application latency regressions sooner Scenario: Detect abnormal latency patterns in an application by alerting on custom OpenTelemetry metrics stored in an Azure Monitor Workspace. Example query: histogram_quantile(0.95, sum by (le, service_name, http_route, http_method) (rate(http_server_duration_seconds_bucket[5m]))) Why dynamic thresholds help: Application latency naturally changes with traffic, user behavior, and release cadence. Fixed thresholds can be noisy during peak periods and too loose during quiet ones. Dynamic thresholds learn a separate baseline for each time series — here, for every service, route, and method — so real p95 latency regressions surface even as traffic and release cadence shift throughout the day. Best practices for better results To get the best results from dynamic thresholds for PromQL-based alerts, design your query so Azure Monitor can learn a clear, stable signal over time: Keep the expression numeric. Dynamic thresholds work best when the query returns a continuous numeric signal rather than a Boolean true/false result. For example, use an expression that calculates CPU usage, not a Boolean comparison like CPU > 0.8. Use meaningful dimensions. Split by dimensions such as namespace, deployment, service, or route when you want separate baselines for different workloads or endpoints. Prefer stable entities. Use longer-lived dimensions or aggregate across short-lived entities so the model has enough consistent history to learn from. In Kubernetes, for example, deployment is usually a better baseline dimension than individual pod ID. Choose the right threshold behavior. Decide whether the alert should trigger on values above the learned upper bound, below the lower bound, or both. Start with medium sensitivity. Use Medium as a balanced default, then tune up or down based on noise and missed anomalies. Allow enough historical data. Dynamic thresholds improve as more history is collected. Initial seasonal patterns use recent history, and weekly seasonality becomes more effective after several weeks of data. Get started Ready to try it? Create a query-based metric alert with dynamic thresholds on your metrics in Azure Monitor Workspace. You can create such rules in the Azure portal, where the built-in preview chart shows when your dynamic threshold alert would have fired based on historical baseline analysis. Use the preview chart to tune both the PromQL query and the dynamic threshold sensitivity before enabling the rule. You can also create query-based metric alert rules using programmatic interfaces or resource templates. Figure 1. Dynamic thresholds preview chart showing the learned baseline and the points where an alert would have fired. Dynamic thresholds cut alert noise where it starts — at detection. The alerts that do fire connect into Azure Monitor’s broader AIOps experience, where the Azure Copilot Observability Agent can help correlate signals into investigated issues with explainable reasoning — with humans in control. Next steps Related blog: Anomaly detection made easy with Dynamic thresholds for Log search alerts Dynamic thresholds in Azure Monitor Query-based metric alerts overview Create query-based metric alerts Prometheus metrics in Azure Monitor OpenTelemetry on Azure Monitor Stay connected Follow the Azure Observability Blog for more updates on Azure Monitor, Prometheus-based monitoring, alerting, and troubleshooting experiences. We’ll continue sharing product updates, practical guidance, and examples to help you improve observability across your Azure environments. Feedback We’d love to hear how dynamic thresholds for query-based metric alerts work for your scenarios. Share your feedback through your Microsoft account team, Azure support channels, or the feedback options in the Azure portal so we can continue improving the experience.
yairgil
Jul 02, 2026 Place Azure Observability Blog
92Views
0likes
0Comments
Log Insights in Minutes: A Simpler pgBadger Workflow
Sometimes the fastest way to understand a PostgreSQL workload is not another dashboard. It is a good log report. pgBadger is a PostgreSQL log analysis tool that turns raw PostgreSQL logs into an interactive HTML report. It helps summarize query activity, connection patterns, errors, temporary files, lock waits, autovacuum activity, and more. Earlier guidance for generating pgBadger reports from Azure Database for PostgreSQL Flexible Server focused on exporting logs through Diagnostic Settings, storing them in a storage account, and then using tools such as BlobFuse and jq to extract PostgreSQL log lines from JSON files. That workflow is still useful when customers centralize logs across multiple servers. However, if you are already using the Server logs feature in Azure Database for PostgreSQL Flexible Server, there is a much simpler path. In this post: You’ll learn how to generate a pgBadger HTML report from Azure Database for PostgreSQL Flexible Server by downloading native PostgreSQL .log files directly from the Azure portal. No storage account, BlobFuse mount, or JSON extraction required. Fast path Configure log_line_prefix . Enable Server logs for download. Download the PostgreSQL .log files. Run pgBadger with the matching prefix. Open pgbadger-report.html . Why use this workflow? With Server logs, you can download native PostgreSQL .log files directly from the Azure portal and run pgBadger locally. Older path Simpler path in this blog Diagnostic Settings → Storage account → BlobFuse → JSON extraction → pgBadger Server logs → Download .log files → pgBadger Area Older Diagnostic Settings workflow Server logs workflow Export path Diagnostic Settings to storage account Download .log files directly from the portal Format JSON payloads need extraction Native PostgreSQL .log files Extra tooling BlobFuse and jq JSON parsing None Best suited for Centralized or multi-server logging Quick per-server analysis Outcome Flexible, but more setup Faster path to pgBadger Recommended: Use the Server logs workflow when you want a fast, low-friction way to generate a pgBadger report from one Azure Database for PostgreSQL Flexible Server. When should you use this workflow? Use this workflow when... Use Diagnostic Settings when... You need a quick report for one Flexible Server. You centralize logs from many servers. You want to run pgBadger locally. You need long-term retention or workspace-level querying. You want to avoid JSON extraction. You already have automated log export pipelines. Before you start A machine where you can install or run pgBadger. A working Perl runtime. Git Bash on Windows, so the multi-line shell commands work as shown. Portal access to your Azure Database for PostgreSQL Flexible Server. Permission to update server parameters and enable Server logs. Important: pgBadger can only analyze what PostgreSQL logs capture. To populate query timing and slow-query sections in the report, enable log_min_duration_statement before collecting logs. Logs collected before that change will not include duration data. Workflow overview Task Type Rough effort Install or prepare pgBadger One-time setup per analysis machine 5–10 minutes Configure log_line_prefix One-time setup per server 2–3 minutes Enable Server logs One-time setup per server 2–3 minutes Download logs and run pgBadger Repeatable 2–5 minutes Install or prepare pgBadger on the machine where you will analyze logs. Configure log_line_prefix so pgBadger can parse each log line. Enable Server logs, so PostgreSQL logs are available for download. Download the logs and run pgBadger locally. 💡Pro tip: Start with a narrow log window first. Use one or two hourly log files, confirm the report looks right, and then expand the analysis window if needed. Step 1: Install pgBadger Before generating a report, you need pgBadger available on the machine where you plan to analyze the downloaded PostgreSQL log files. Run this on a Linux VM, WSL, or another Linux-based environment where you can install packages. Note: Azure Cloud Shell may work for quick testing, but package installation and build-tool availability can vary by session. For repeatable analysis, use a Linux VM, WSL, or another environment you control. Copy and run sudo apt-get update && sudo apt-get install -y git perl make gcc && \ git clone https://github.com/darold/pgbadger.git && \ cd pgbadger && \ perl Makefile.PL && \ make && \ sudo make install && \ pgbadger -V What good looks like: The install command completes successfully and pgbadger -V returns the installed pgBadger version. Step 2: Configure log_line_prefix This is a one-time server configuration step. The log_line_prefix parameter controls the beginning of each PostgreSQL log line. pgBadger uses this prefix to extract useful fields such as timestamp, user, database, and process ID. In the Azure portal, open your Flexible Server and go to Server parameters. Search for: Parameter log_line_prefix Set this value %m user=%u db=%d pid=%p: Then select Save. In Server parameters, confirm that the custom value is saved for log_line_prefix . Figure 1: Set log_line_prefix so pgBadger can correctly parse timestamp, user, database, and process ID from each log line. Prefix tokens Token Meaning %m Timestamp with milliseconds %u Username %d Database name %p Process ID After this change, log lines should look like this: Example log line 2026-06-22 19:00:00.070 UTC user=pgadmin db=highcpu pid=3805603: LOG: statement: SELECT 1 FROM pg_extension WHERE extname='pg_stat_statements' The matching pgBadger prefix for this log format is: Matching pgBadger prefix %m user=%u db=%d pid=%p: You will use this same value later in the pgBadger command. What good looks like: The server parameter is saved, and new PostgreSQL log lines begin with timestamp, user, database, and process ID fields that match the pgBadger prefix. Step 3: Enable Server logs for download This is also a one-time setup step. In the Azure portal, open your Flexible Server and go to Server logs. Enable: Portal setting Capture logs for download Set the retention period based on how long you want logs to remain available for download. For example, a 7-day retention period keeps logs available for download for 7 days. In Server logs, enable Capture logs for download and choose the retention window. Figure 2: Enable Capture logs for download and set a retention period long enough to cover the analysis window you want to inspect. What good looks like: After Server logs are enabled, hourly PostgreSQL log files appear in the Server logs blade and can be downloaded from the Azure portal. Once enabled, hourly log files appear in the Server logs blade. The files are named by date and hour, for example: Example log files postgresql_2026_06_22_19_00_00.log postgresql_2026_06_22_20_00_00.log Step 4: Download and organize the logs locally From the Server logs page, select the .log files for the time window you want to analyze and download them. For example, to analyze activity between 19:00 and 21:00 UTC, download: Example files to download postgresql_2026_06_22_19_00_00.log postgresql_2026_06_22_20_00_00.log On your local machine, create a folder for that analysis window. A simple convention is to use the Mon-DD format. Folder name Jun-22 Place the downloaded .log files inside that folder. Your local folder structure should look like this: Folder structure pgbadger-13.1/ pgbadger Jun-22/ postgresql_2026_06_22_19_00_00.log postgresql_2026_06_22_20_00_00.log Step 5: Generate the pgBadger report Open Git Bash from the folder where pgBadger is located. For example, if pgBadger is inside the pgbadger-13.1 folder, open Git Bash from that folder. # Action Command 1 Set the folder FOLDER=Jun-22 2 Confirm files ls -lh ./$FOLDER 3 Run pgBadger Use the full command below. Copy and run FOLDER=Jun-22 ls -lh ./$FOLDER perl -X ./pgbadger -f stderr \ --prefix '%m user=%u db=%d pid=%p:' \ ./$FOLDER/*.log \ -o ./$FOLDER/pgbadger-report.html Command breakdown Part of command Purpose perl -X ./pgbadger Runs pgBadger and suppresses non-critical Perl warnings. -f stderr Parses PostgreSQL stderr log files. --prefix '%m user=%u db=%d pid=%p:' Matches the log_line_prefix set on the server. ./$FOLDER/*.log Analyzes every .log file in the selected folder. -o ./$FOLDER/pgbadger-report.html Writes the HTML report into the same folder. When the command completes successfully, you should see output like this: Expected output Parsed 12134249 bytes of 12134249 (100.00%), queries: 26684, events: 83 LOG: Ok, generating html report... What good looks like: pgBadger finishes parsing the logs and creates pgbadger-report.html in the selected folder. Step 6: Open the report Open the generated report: Copy and run start ./$FOLDER/pgbadger-report.html The report opens in your default browser. The final report is created here: Generated report path Jun-22/pgbadger-report.html What the report can show The pgBadger report gives you a quick view into the workload shape for the selected log window. For example, in a sample run across two hourly log files, pgBadger summarized: Total number of queries. Number of unique normalized queries. Query traffic over time. Events such as errors and fatal messages. Session and connection patterns. Once the report opens, start with Global Stats to confirm the time range, total queries, normalized queries, and query peak. Figure 3: Start with Global Stats to validate the selected time range, total query count, normalized query count, and query peak. Query volume and normalized queries Many raw queries can often reduce to a smaller number of normalized query patterns. This helps identify whether the workload is spread across many different query shapes or dominated by a smaller set of repeated statements. Example: In this sample run, 26,684 queries reduced to 59 normalized query shapes. That suggests the workload is mostly a small set of repeated statements, which can help focus tuning effort. Traffic patterns The SQL Traffic section helps identify spikes, quiet periods, and workload changes over time. Figure 4: Use SQL Traffic to identify query spikes, quiet periods, and workload changes during the selected log window. Figure 5: Review the query breakdown to compare read vs. write volume and query-type distribution for the selected Server logs window. For example, if the report shows a steady baseline followed by a sharp spike, that spike can be correlated with application activity, batch jobs, synthetic tests, or operational events during the same time window. Query duration If query duration shows 0 ms or the slow query sections are empty, it usually means duration logging was not enabled when the logs were collected. In that case, pgBadger can still show query counts and events, but it cannot calculate the slowest queries, total execution time, average duration, or maximum duration. To unlock those timing sections, enable log_min_duration_statement , collect fresh logs, and rerun pgBadger. What pgBadger cannot infer from missing logs pgBadger reports are only as complete as the log data you provide. If PostgreSQL did not log duration, lock waits, temporary files, or autovacuum activity during the selected time window, pgBadger cannot reconstruct those details later. To analyze... Enable before collecting logs Slow queries log_min_duration_statement Lock waits log_lock_waits Temporary files log_temp_files Autovacuum activity log_autovacuum_min_duration Repeatable copy/paste block Reusable command block Change only FOLDER for each new analysis window. Copy and run FOLDER=Jun-22 ls -lh ./$FOLDER perl -X ./pgbadger -f stderr \ --prefix '%m user=%u db=%d pid=%p:' \ ./$FOLDER/*.log \ -o ./$FOLDER/pgbadger-report.html start ./$FOLDER/pgbadger-report.html For another date, change only this line: Update this value FOLDER=Jun-22 Examples: Example folder values FOLDER=Jun-23 FOLDER=Jul-01 FOLDER=Aug-15 Optional: Improve report quality pgBadger can only analyze the information captured in PostgreSQL logs. The default logs may be enough for query frequency, connection activity, and errors. For deeper performance troubleshooting, consider enabling additional logging parameters based on your scenario. Scenario Parameter Suggested value Notes Slow query analysis log_min_duration_statement 1000 Logs statements slower than 1 second. Short controlled test log_min_duration_statement 0 Logs every statement. Use carefully. Lock troubleshooting log_lock_waits on Helps identify lock waits. Temporary file analysis log_temp_files 0 Logs all temporary files. Autovacuum visibility log_autovacuum_min_duration 0 Useful during focused analysis. Useful parameters include: Recommended logging parameters log_lock_waits = on log_temp_files = 0 log_autovacuum_min_duration = 0 To capture query durations, configure: Duration logging log_min_duration_statement = 1000 This logs statements that run longer than 1000 milliseconds. For short test runs, you can temporarily use: Short test run only log_min_duration_statement = 0 Caution: Use log_min_duration_statement = 0 carefully on busy production servers. It logs every statement and can generate a large volume of logs. Duration matters: If duration logging is not enabled, pgBadger can still show query counts and events, but slowest-query, total duration, average duration, and maximum duration sections will be limited or empty. Common mistakes and quick fixes Symptom Likely cause Fix Report is empty Prefix mismatch Match --prefix with log_line_prefix . No duration data Duration logging was not enabled Set log_min_duration_statement before collecting logs. No files visible Server logs disabled or retention expired Enable capture and check retention. pgBadger command fails pgBadger is not in the current folder or path Run pgbadger -V to confirm installation. Common troubleshooting FAQs 1. Report is created but empty This usually means the pgBadger prefix did not match the actual log format. Check the first few lines: Copy and run head -5 ./$FOLDER/*.log Make sure the pgBadger --prefix matches the server’s log_line_prefix . 2. Report shows queries but no duration PostgreSQL logged statements but did not log durations. Enable one of the following, collect fresh logs, and rerun pgBadger: Parameter options log_min_duration_statement = 1000 # or temporarily for testing log_min_duration_statement = 0 3. No .log files are visible Confirm that Server logs are enabled: Portal setting Capture logs for download Also check the retention period. If the retention period has expired, older logs may no longer be available for download. 4. pgBadger command fails Confirm that pgBadger is available in the current folder or installed in your path. Copy and run pgbadger -V If you are running pgBadger from the local folder, use: Copy and run perl -X ./pgbadger Summary For customers already using Azure Database for PostgreSQL Flexible Server logs, the pgBadger workflow is straightforward: Install pgBadger. Configure log_line_prefix . Enable Server logs for download. Download the .log files. Place them in a local date-based folder. Run pgBadger with the matching prefix. Open pgbadger-report.html . Bottom line: Server logs give you the shortest path from Azure Database for PostgreSQL Flexible Server logs to a pgBadger report. Download the native .log files, run pgBadger with the matching prefix, and open the generated HTML report. References pgBadger - source and documentation GitHub pgBadger - project site Azure - Download server logs from the portal Flexible Server Azure - Logging concepts Flexible Server Azure - Configure server parameters via the portal PostgreSQL - log_line_prefix and logging parameters
varun-dhawan
Jun 30, 2026 Place Microsoft Blog for PostgreSQL
320Views
2likes
0Comments
Announcing new security, maintenance and analytics features for PostgreSQL at Microsoft Build 2026
At Microsoft Build 2026, we’re announcing a major wave of PostgreSQL innovation across Azure. Alongside the public preview of Azure HorizonDB, we’re delivering a broad set of enhancements for our fully managed open-source PostgreSQL service: Azure Database for PostgreSQL flexible server. These updates span performance, analytics, security, operations, resilience and migration - helping you build faster, operate with more control, secure your workloads, and modernize with confidence. Here’s a quick tour of the top flexible server announcements at Build 2026. Feature Highlights pg_duckdb Extension pg_ivm Extension Defender Security assessments temporal_tables Extension Cross-tenant CMK Automatic Entra token refresh libraries New Powershell module: Az.PostgreSQLFlexibleServer More control over planned maintenance Pre-Upgrade validation checks New Built-in Grafana dashboards Chaos Studio supports Azure Database for PostgreSQL AI-assisted Oracle to PostgreSQL migration Migration Service for Azure Database for PostgreSQL improvements (EDB, AlloyDB) Performance, Scale & Analytics pg_duckdb Extension Generally Available The pg_duckdb extension enables you to accelerate high-performance analytics and data-intensive applications with DuckDB’s SQL engine running inside your Postgres server. We’re pleased to announce pg_duckdb is now generally available in Azure Database for PostgreSQL. The latest version builds on the preview with the latest DuckDB engine improvements and optimized performance. This version adds vectorized execution for faster analytical queries, delivering significant improvements in aggregation performance, along with new support for writing to Azure Blob Storage and querying Parquet data directly from PostgreSQL. These capabilities enable high-performance analytics on your external data and simplify data processing workflows. Learn more: pg_duckdb. pg_ivm Extension Generally Available Materialized views are a useful way to optimize performance for queries that run regularly, but if underlying data becomes stale the result set needs to be recomputed. With the pg_ivm extension you can automatically maintain materialized views as the underlying data changes. This is particularly valuable for large datasets with small incremental changes that need real-time freshness, like dashboards, catalog analytics and SaaS usage reporting. We are pleased to announce the pg_ivm extension is now generally available in Azure Database for PostgreSQL. Learn more: pg_ivm. Security, Auditing & Identity Defender security assessments Preview Microsoft Defender Security Assessments for Azure Database for PostgreSQL enables continuous evaluation of your database security posture, helping identify vulnerabilities and misconfigurations across server and database configurations. Previously limited to reactive threat detection, in the latest preview release, Defender now provides proactive, risk-based insights through assessments tailored to PostgreSQL-specific best practices, delivering more relevant and actionable guidance. This helps you strengthen your security baseline, prioritize remediation, and align with best practices and compliance requirements. Learn more: https://aka.ms/Defender-Assessments-for-PG-Preview temporal_tables Extension Generally Available We’ve had many customer requests to support the temporal_tables extension, which provides built-in support for tracking and querying historical changes to data over time. Temporal tables are now generally available in Azure Database for PostgreSQL. With this extension enabled you can easily perform time-based queries, audit data changes, and maintain historical records without building custom tracking logic, simplifying application development and compliance scenarios. Learn more: temporal_tables Cross-tenant CMK Preview Azure Database for PostgreSQL now supports cross-tenant customer-managed keys (CMK) in public preview, allowing you to encrypt your data at rest using an Azure Key Vault key that resides in a separate Microsoft Entra tenant from the database service. This feature is designed for SaaS providers and enterprises that need to maintain strict separation of duties and ownership of encryption keys, enabling you to retain full control over key lifecycle management while PostgreSQL runs in a service provider’s tenant. Learn more: Data encryption at rest in Azure Database for PostgreSQL Automatic Entra token refresh libraries Preview We’re making it easier to use Entra ID authentication with Azure Database for PostgreSQL throughout the application stack by introducing new token refresh libraries for .NET, JavaScript, and Python. With Entra ID, access tokens are short-lived which can make managing their lifecycle complex in real-world applications. Developers need to be aware of token refresh and build additional handling around token expiration, connection retry, and session continuity. These new libraries remove that friction. By handling Entra token refresh seamlessly in the background, they allow applications to stay connected without interruption and with no custom logic required. The result is a simpler development experience and more resilient applications, especially for long-running or connection-heavy workloads. Across languages, the libraries provide a consistent and streamlined way to adopt secure, passwordless authentication, helping teams focus more on building their applications and less on managing authentication. Learn more: .NET, JavaScript, and Python. Operations, Maintenance & Monitoring New Powershell module: Az.PostgreSQLFlexibleServer Generally Available We’re excited to introduce the newly renamed Az.PostgreSQLFlexibleServer PowerShell module, delivering a streamlined experience for managing Azure Database for PostgreSQL with PowerShell. Building on the capabilities of the previous Az.PostgreSql module, the updated module aligns with the new features in the 2026-01-01 preview REST API. This module brings support for PostgreSQL 18, elastic clusters for scalable workloads and a range of enhancements designed to simplify management and improve performance. Whether you're provisioning new deployments or managing complex environments, this module ensures you can take full advantage of the latest platform capabilities directly from PowerShell. To learn more, visit our official documentation on PowerShell: Az.PostgreSql Module | Microsoft Learn More control over planned maintenance Generally Available We’ve seen many requests to provide more control when a maintenance update is applied to Azure Database for PostgreSQL. Sometimes when a critical workload is running you want to apply the maintenance when you’re ready. Announcing general availability this week, we’re building on the existing System and Custom maintenance window options and adding new self-service maintenance capabilities to the Azure portal. You can now reschedule upcoming maintenance updates for up to two weeks and apply maintenance on demand at a time that suits you. You can also view scheduled maintenance and review your server’s maintenance history after updates are complete. These options help you better align maintenance with your business schedules, reduce disruption during critical workload periods, and minimize the need for support-driven deferral requests. CLI and API support are coming soon. Learn more: https://aka.ms/azure-postgres-reschedule-maintenance Pre-Upgrade validation checks Preview Major version upgrades are critical for staying current with PostgreSQL features, security updates, and performance improvements, but you often discover blockers only after starting the upgrade workflow. Pre-Upgrade Validation Checks lets you validate upgrade readiness before initiating the actual upgrade by running Azure-specific upgrade checks and PostgreSQL pg_upgrade --check validations independently. The shift is simple: you can identify and fix upgrade blockers before the upgrade window begins. The feature surfaces actionable issues across configurations, extensions, dependencies, replication slots, event triggers, and other upgrade-sensitive objects. You can fix blockers, re-run validation until all checks pass, and proceed with the upgrade with greater predictability. Learn more: https://aka.ms/pg-flex-upgrade-checks New Built-in Grafana dashboards Generally Available Grafana dashboards are now built directly into the Azure portal for Azure Database for PostgreSQL - no setup, no extra cost, and no separate service to manage. You can open your PostgreSQL resource in the portal and immediately access prebuilt dashboards for key health and performance signals such as CPU, memory, storage, IOPS, connections, transactions, and availability. The key value is metrics + logs in one place. You can quickly correlate performance spikes with PostgreSQL logs, understand what changed, and troubleshoot faster using the familiar Grafana experience. Dashboards can also be customized, saved to your subscription, and shared across teams for ongoing operations. Learn more: https://aka.ms/azure-postgres-dashboards-grafana Resilience & Business Continuity Chaos Studio supports Azure Database for PostgreSQL Preview No matter how much you prepare, you only really know how good your database disaster recovery plan is when something breaks. With Chaos Studio support for Azure Database for PostgreSQL, you can simulate zone-down scenarios on PostgreSQL HA-enabled instances and validate the resilience of your mission-critical workloads. With Chaos Studio integration, you can proactively test failover behavior and gain confidence in how your applications respond to real-world zonal failures. This feature is currently available through a gated private preview. To get started, submit your subscription details using the form. Once reviewed, our team will enable the feature for your subscription, with guidance to help you begin testing. Getting started is simple: Create a Chaos Studio workspace via the Chaos Studio portal and configure your subscription, resource group, and region. Define the scope and assign the required managed identity and permissions. Review and verify your workspace setup. Browse available scenarios and select the PostgreSQL zone-down scenario. Configure the test (name, duration), then run it from My Library to begin validating failover behavior. With just a few steps, you’ll be able to simulate real-world failure conditions and gain confidence in your application’s resilience. To get started, please submit your details using this link: Private Preview Support for Chaos Studio Migration & Modernization AI-assisted Oracle to PostgreSQL migration Generally Available AI-assisted migration tooling has dramatically lowered the bar for moving between different databases and is changing the way people look at the return on investment for migration. The VS Code PostgreSQL extension comes with AI-Assisted migration tooling which converts Oracle schema and application code to Azure Database for PostgreSQL. This tooling uses GitHub Copilot, Microsoft Foundry, and custom Language Model tools to convert Oracle schema, database code and client applications into the PostgreSQL equivalents, and validates every change against a running flexible server instance. Learn more: Schema conversion, App conversion. Migration Service for Azure Database for PostgreSQL improvements (EDB, AlloyDB) Generally Available We’ve added AlloyDB and EDB Extended Server as new sources for migrating to PostgreSQL in the Azure Database for PostgreSQL Migration Service, with support for both online and offline migration support. Learn more: Migrate from AlloyDB, Migrate from EDB. Looking ahead That wraps up the Build 2026 announcements for Azure Database for PostgreSQL flexible server. There are also many great PostgreSQL technical sessions at Build this week, covering cloud-native app & AI development and migration. To find out more, here's a link to the Build session catalog for PostgreSQL sessions: https://aka.ms/Postgres-on-Azure_Build-2026. We'll continue to build out our roadmap over the coming months to deliver on your asks to improve the performance, security and stability of your PostgreSQL workloads. Check the Microsoft Blog for PostgreSQL for a regular monthly recap where we share the latest enhancements and product updates.
GuyBowerman
Jun 29, 2026 Place Microsoft Blog for PostgreSQL
1.1KViews
2likes
0Comments
Azure Copilot Observability Agent is generally available, with autonomous operations in preview
Complex cloud environments have outpaced manual operations. Agentic cloud operations connect people, tools, and data to streamline investigation workflows and move teams from scattered signals to evidence-backed next steps. With unified observability, teams can investigate Azure-monitored applications, Azure Kubernetes Service (AKS) environments, VMs, Foundry telemetry, infrastructure, and platform signals with greater context and control. Powered by Azure Monitor, the Azure Copilot Observability Agent is now generally available. It helps engineering, SRE, DevOps, and operations teams move from telemetry and alert noise to investigated issues, explainable reasoning, and recommended next steps that can reduce Time-To-Mitigate (TTM). Autonomous operations are also available in public preview. They help prepare context and reduce triage work while people remain responsible for mitigation decisions and any changes to the environment. From alert noise to investigated issues The Observability Agent helps teams reduce the effort required to understand operational problems. Instead of starting every investigation from a dashboard, query editor, or alert payload, teams can work with an AI companion that reasons across telemetry, Azure resource context, discovered topology, and custom instructions to identify what changed, what is correlated, and what evidence supports the conclusion. Teams can start with natural-language exploration and continue into deeper investigations when an issue requires more evidence. That light-to-deep workflow helps responders move from broad questions to a structured investigation without losing the reasoning trail. Here's what this looks like in practice: after a deployment, several alerts might fire across an app, database dependency, and compute resource. The Observability Agent can group those signals around the affected service, identify when the regression started, compare related dependencies and infrastructure metrics, and capture the findings in an Azure Monitor issue. The responder can then validate the evidence, add team context, route work to the right owner, and decide whether a rollback, configuration change, or code fix is appropriate. Explainable investigations across Azure-monitored signals Operations teams need more than a chatbot that answers questions. The Observability Agent follows an investigation workflow: it frames hypotheses, gathers evidence, compares signals by time, scope, and type, rules out weak explanations, and shows the reasoning path behind its findings. The Observability Agent can help teams: Investigate incidents and alerts across Azure-monitored applications, Azure Kubernetes Service (AKS) environments, VMs, Foundry telemetry, infrastructure, and platform signals Correlate related signals to reduce noise and surface higher-signal issues with context Explore telemetry using natural language while preserving transparency into the supporting data Compare signals by time, scope, and type to separate likely causes from coincidental changes Provide a reasoning trail that shows what the agent found, what it ruled out, and why Recommend next steps that engineers can review before deciding how to act This same investigation model applies to specialized skills and issue types, including customer's application, Azure Kubernetes Service (AKS), Foundry, VMs, and GenAI issues. When the relevant telemetry is available, the Observability Agent can correlate logs, metrics, traces, alerts, dependencies, resource graph, resource health, activity logs, Foundry telemetry, and changes. This helps teams investigate customer-visible issues with evidence, including latency, token spikes, tool-call failures, agent errors, hallucinations, deployments, API failures, performance regressions, infrastructure dependencies, and platform incidents. This explainability is central to the product. In production operations, trust is earned through evidence. The Observability agent is built to support human judgment, not bypass it. . Azure expertise, with context from your environment Context matters in every investigation. The same symptom can mean different things depending on application architecture, recent deployments, dependencies, historical incidents, and team practices. The Observability Agent brings Microsoft and Azure operational knowledge into the investigation experience. It can use discovered topology, Azure resource context, logs, metrics, traces, and custom instructions to ground investigations in signals that are more relevant to your environment. Native to Azure Monitor, with humans in control Because the Observability Agent is built into Azure Monitor, teams can use it close to the telemetry, alerts, and workflows they already rely on. Investigations can also be captured as Azure Monitor issues, creating a shared case file for humans and agents to collaborate on evidence, reasoning, and next steps. The Observability Agent is designed for governed AI operations inside Azure Monitor. Interactive chat and investigations use the signed-in user's identity and Azure role-based access control (RBAC). Prompts and responses are not used to train foundation models, and the agent doesn't restart resources, change configuration, or resolve issues on its own. Autonomous operations in public preview Alongside general availability, autonomous operations for the Observability Agent are available in public preview. When enabled, the agent can analyze alerts in the background, correlate related alerts when they likely represent the same incident, create Azure Monitor issues automatically, and run deep investigations on agent-created issues. This automatic triage helps reduce alert noise by turning streams of individual alerts into higher-signal issues with context, findings, and recommended next steps. Teams can review the issue, continue the investigation, and decide what action to take. Autonomous operations are designed to prepare context and reduce triage work, not to remove human control. Engineers remain responsible for decisions, approvals, and any changes to the environment. Next steps Check out our latest announcements and related blogs: Azure Blog and OMB Blog. Learn how to use the Observability Agent in Azure Copilot Observability Agent. Explore how investigations work in Deep investigations in the Azure Copilot Observability Agent. Learn more on how to Chat with your observability data Learn how teams preserve context in Azure Monitor issues. Review preview details in Autonomous operations in the Azure Copilot Observability Agent. Stay connected Follow this blog for ongoing deep dives, updates on current capabilities, and a preview of what's coming next. Live webinar - a walkthrough of real Observability Agent scenarios, best practices, and what's available today - along with a look at what's coming next, and live Q&A with the product team. Register for the Observability Agent webinar. We'd love your feedback The Observability agent continues to evolve based on real-world usage and operator feedback. Share your thoughts directly through the Give Feedback option in the experience, or reach us at enauerman@microsoft.com.
EfratNauerman
Jun 23, 2026 Place Azure Observability Blog
8.7KViews
6likes
0Comments
Azure PostgreSQL Fleet-Level Monitoring with Log Analytics and KQL
Use Log Analytics (KQL) to enable fleet-level alerting and advanced monitoring across Azure PostgreSQL instances.
varun-dhawan
Jun 18, 2026 Place Microsoft Blog for PostgreSQL
7.4KViews
6likes
4Comments
Accelerating AKS troubleshooting with the Azure Copilot Observability Agent
AKS incidents rarely stay within one Kubernetes object, signal, or tool. A latency spike might first appear in application telemetry, but the root cause may sit elsewhere: pod restarts, node pressure, scheduling failures, or a recent configuration change. The Azure Copilot Observability Agent in Azure Monitor helps connect these signals into an explainable investigation, so teams can move from symptoms to evidence-backed next steps. Why AKS troubleshooting is complex Troubleshooting Azure Kubernetes Service (AKS) is complex because failures can originate in workloads, platform components, infrastructure, or the application code running on the cluster. For example, pods stuck in Pending may indicate capacity or scheduling issues, while application latency may be caused by throttling, failed probes, pod restarts, or node pressure below the app. During an incident, simply having more telemetry is not enough. Teams need a way to test likely causes, rule out unrelated signals, and keep the investigation tied to the affected workload and time window. From signal to root cause: the investigation flow The Observability Agent follows a consistent investigation pipeline: Scope the problem by identifying the most likely infrastructure resources involved, plus connected dependencies. Collect data across metrics, logs, traces, change history, and related signals. Detect anomalies using learned baselines (for metrics) and log analysis. Correlate across resources spanning infrastructure and application layers. Run deep diagnostics by invoking resource-specific tools when needed to pinpoint root cause. Summarize findings in a structured format: what happened, why it happened, and what to do next. AKS investigation data sources The agent works with telemetry already available in your Azure Monitor environment. Investigation depth improves as more relevant signals are enabled, including Container insights logs, Kubernetes events and state, Azure managed service for Prometheus, container and pod logs, Application Insights telemetry for AKS-hosted workloads, Azure Activity Log changes, control plane logs routed through diagnostic settings, and resource metadata for the cluster, node pools, workloads, and related Azure resources. Figure 1. AKS investigation data sources You don’t need to enable every telemetry source to get started. The Observability Agent uses the data already available in Azure Monitor, and its findings become more complete as more AKS and application signals are collected. Example 1: AKS infrastructure — explaining why new pods never start Consider a workload rollout on AKS where replacement pods remain stuck in Pending state. What looks like a failed release may stem from the workload definition, cluster state, or underlying infrastructure. Investigation walkthrough Symptom: rollout is blocked Replacement pods remain in Pending during rollout, and Kubernetes events show repeated scheduling failures. This indicates that the rollout is blocked before new pods can start. Workload evidence: scheduling, not startup Pod state identifies the affected workload, while Kubernetes events show repeated placement failures. The issue is therefore tied to scheduling rather than application startup or container crash behavior. Cluster evidence: capacity pressure When enabled, Prometheus node metrics show CPU and memory utilization near capacity. Cluster-level trends show resource pressure increasing at the same time as pending pods and scheduling failures. Likely cause: insufficient schedulable capacity The scheduler cannot place new pods because the relevant node pool does not have enough available capacity. The failed rollout is best explained by capacity pressure in the target node pool rather than an application crash or image startup failure. Recommended action Scale out the affected node pool or adjust workload resource requests, then retry the rollout once schedulable capacity is restored. Figure 2. AKS investigation flow The Observability Agent connects pod state, scheduling events, and node pressure to explain why the rollout is blocked and which capacity action to consider next. Example 2: Joint app-AKS investigation — tracing application latency to pod restarts Now consider a customer-facing application where users see increased latency and intermittent HTTP 5xx errors after deployment. The first symptom appears in application telemetry, but the unhealthy requests are served by pods that are repeatedly restarting in AKS. Investigation walkthrough Symptom: customer-facing service degradation After deployment, application telemetry shows increased latency and HTTP 5xx errors. The first visible impact appears at the application layer. AKS evidence: unstable pods Affected pods enter CrashLoopBackOff, restart counts increase, and Kubernetes events show back-off restarts, probe failures, or image or command errors. Container logs point to startup exceptions, missing configuration, or crash details. Resource evidence: workload-specific pressure Container memory usage approaches configured limits before restarts, while node metrics show no broad node pressure. This suggests the issue is workload-specific rather than cluster-wide capacity related. Change evidence: deployment correlation Deployment history shows a new image or configuration change shortly before restarts began, with no matching platform health event. The timing points to the latest deployment or configuration change. Recommended action Review the latest image or configuration change, inspect container logs, adjust memory limits, or roll back if needed. Focus remediation on the workload change rather than node pool scaling. This pattern shows how an application symptom can map back to AKS workload behavior. Application telemetry establishes the user impact, while Kubernetes events, container logs, and resource metrics help explain why the affected pods keep failing. Operational impact For site reliability engineers, platform teams, and IT professionals, the Observability Agent reduces the time spent moving between application and AKS telemetry. It brings relevant signals into one investigation, surfaces supporting evidence, and applies Azure Monitor and AKS context so your team can review the findings, validate the recommended path, and decide which production changes to make. Figure 3. AKS investigation results Using the Observability Agent You can start using the Observability Agent from the Azure portal in two common AKS troubleshooting flows: Investigation mode: Start an investigation from an Azure Monitor alert on an AKS resource or from an Application Insights alert for an AKS-hosted workload. The agent uses the alert context to scope the incident, correlate application and cluster telemetry, and summarize the likely cause with recommended next steps. Chat-based exploration: Open the Monitor experience in AKS and select the Observability Agent button to chat with your telemetry. Use natural language to ask follow-up questions, explore logs and metrics, detect and inspect anomalies, and narrow down likely causes. Figure 4. Starting Observability Agent from AKS Monitor experience Next steps Azure Copilot Observability Agent overview Monitor Azure Kubernetes Service with Azure Monitor Stay connected Follow this blog for ongoing deep dives, updates on current capabilities, and a preview of what's coming next. Live webinar — A walkthrough of real Observability Agent scenarios, best practices, and what's available today, along with a look at what's coming next and live Q&A with the product team. Register for the Observability Agent webinar. We'd love your feedback The Observability Agent continues to evolve based on real-world usage and operator feedback. Share your thoughts directly through the Give Feedback option in the experience, or reach us at: azureobsagent@microsoft.com
yairgil
Jun 17, 2026 Place Azure Observability Blog
223Views
0likes
0Comments
Anomaly detection made easy with Dynamic thresholds for Log search alerts
We’re excited to announce the General Availability of dynamic thresholds for log search alerts in Azure Monitor. Dynamic thresholds make anomaly detection easier by using machine learning to learn normal behavior from your historical log query results, automatically account for patterns such as hourly, daily, and weekly seasonality, and adapt as your environment changes. Instead of manually choosing static limits that can quickly become outdated, you can let Azure Monitor automatically determine the right threshold for each alert rule. Dynamic thresholds for Log search alerts are available at no extra charge - you pay the standard log search alert rule rate. Why it matters Simplified configuration: No need to fine-tune thresholds manually. Adaptive monitoring: Alerts automatically adapt to changing usage patterns and trends. At-scale intelligence: For multi-dimensional monitoring, thresholds are calculated per dimension combination. Example use cases AKS Pod restart spike anomaly detection Scenario: Monitor Kubernetes Pod logs for spikes in pod restarts across clusters. Why dynamic thresholds help: AKS workloads often scale dynamically; static thresholds can’t account for autoscaling patterns. Dynamic thresholds adapt to normal fluctuations in node/pod counts and alert only on true anomalies. Example query: KubePodInventory | summarize restartCount = sum(PodRestartCount) by bin(TimeGenerated, 10m), ClusterName, Namespace, Name Dynamic threshold settings: Namespace (for workload-level baselines). Name (for per-pod granularity if needed). Measure: restartCount (the aggregated column from the query). Split by dimensions (optional): Namespace (for workload-level baselines). Name (for per-pod granularity if needed). Resource Inventory Drift Detection (Azure Resource Graph) Scenario: Detect sudden spikes in resource creation or deletion across subscriptions or management groups utilizing Log search alerts integration with Azure Resource Graph that may indicate runaway deployments. Why dynamic thresholds help: Large organizations often have thousands of resources with varying deployment patterns. Static thresholds can’t account for seasonal changes (e.g., monthly deployments, scaling events). Dynamic thresholds adapt per subscription or resource type, reducing false positives. Example query: arg("").Resources | summarize resourceCount = count() by type, subscriptionId Dynamic threshold settings: type (for specific resource type changes). subscriptionId (for per-subscription granularity). Measure: resourceCount (the aggregated column from the query). Split by dimensions (optional): type (for specific resource type changes). subscriptionId (for per-subscription granularity). Getting Started Learn more about Log search alerts with dynamic thresholds and how to set up alert rules in Azure Monitor.
Efrat_Ben_Porat
Jun 16, 2026 Place Azure Observability Blog
219Views
0likes
0Comments
General Availability: Simple log alerts in Azure Monitor
We are excited to announce the General Availability of Simple log alerts in Azure Monitor! This feature is designed to provide a simplified and more intuitive experience for monitoring and alerting, enhancing your ability to detect and respond to problems in near real-time. Simple log alerts are a type of Log search alerts in Azure Monitor, designed to provide a simpler alternative to traditional Log search alerts. Unlike Log search alerts that aggregate rows over a defined period, Simple Log Alerts evaluate each row individually. Simple Log Alerts are supported using Basic logs as well. Before, choosing Basic logs for cost optimization - for example, configuring the traces table in Application Insights with Basic logs plan - meant giving up the ability to alert on that data. Simple log alerts close that gap, so you can keep the cost savings and alert on telemetry stored in Basic Logs. 🌐 When to use Simple Log Alerts are ideal for monitoring applications or network traffic where unaggregated, real-time detection and quick incident response are critical. Example scenarios: Failed automation jobs - get notified the moment a backup job, scheduled task, or any automated process fails, rather than waiting for an aggregation window. Windows events affecting storage or security - alert on individual event log entries that signal disk failures, security breaches, or service disruptions. 🔁 Flexible Trigger Recurrence By default, Simple log alerts fire on every matching row, but you can tune this to reduce noise. Choose to alert only when the condition is met at least once, twice, three times, or a custom number of times within a minute - giving you control over sensitivity without sacrificing the low-latency. 💰 Pricing Information Simple log alert rules evaluate your data every minute, so billing is the same as 1-minute frequency alert rules. For detailed pricing information, refer to Pricing - Azure Monitor | Microsoft Azure. You will see these rules in your billing statement tagged with kind:simplelogalert. 📚 Documentation and Links Create a simple log search alert in Azure Monitor - Azure Monitor | Microsoft Learn Overview of Azure Monitor alerts - Azure Monitor | Microsoft Learn
Efrat_Ben_Porat
Jun 16, 2026 Place Azure Observability Blog
392Views
1like
0Comments
The Azure Copilot Observability Agent Chat - Stop Writing Queries, Start Asking Questions.
Services and applications produce massive amounts of telemetry – and making sense of all this data takes effort. Data is often spread across different stores, which means the way to clear insights goes through careful querying, refinement and correlation. The Azure Copilot Observability agent now has a chat experience that simplifies this dramatically – you just ask, in your own plain, natural language. Ask questions. Get answers. To start chatting with the Observability agent, select a resource in the Azure Portal, and choose Logs from the resource menu. Click the Observability agent button. Soon, additional Azure observability experiences will show this or similar buttons so you can chat with the agent throughout your observability process. The Observability agent chat opens with a short intro message, and a few suggested prompts. Select one of the suggestions or type your question in natural language: “What errors increased in the last 24 hours?” “¿Existen anomalías de latencia?” (are there any anomalies) “どの依存関係が失敗しているか” (which dependencies are failing) The agent translates your prompt into queries across all relevant data sources, analyses your data, and returns clear, data-backed insights – so you don't need to write KQL queries, switch between logs and metrics experiences, or dive into the schemas of your data store. Explore your data – interactively The chat experience is designed for an interactive process of data exploration and troubleshooting. Through the chat you can explore trends in logs and metrics, identify anomalies and visualize results directly in the chat – all from one interface. Note: The agent operates here as your personal observability assistant - and it can only query data in your behalf, and access resources that you can access. The chat with the agent has a progressive exploration flow, instead of isolated queries. Still, in each step in the conversation the agent provides a clear chain of thought, and in it - the actual queries it used - so you can keep clear track of how it understood your prompt, and created the provided output. Results are show clearly and explained. In the example shown here, we follow up and ask the agent to create a time chart of the failed operations impacted by the errors it reported earlier. The result is clear - GET Customers/Details was impacted significantly, reaching 100K failed requests over a long time: From exploration to guided investigation The chat is very useful for guided investigations that go as deep as you choose, just as you would with the classic analysis tools over logs or metrics. Following the example shown above, we ask the agent to show exceptions or traces correlated with the failed requests: The agent found an association to NullReferenceException, and suggests going deeper and use the operation_Id field to clearly identify the request -> dependency -> exception sequence. We'll accept the recommendation and choose the first suggestion: Pull full transaction timeline. And here it is - each step of the transaction timeline explained, and the culprit is found - a failed Azure Table dependency. We didn't have to write queries, review metrics, join tables or even know which tables are there. We used standard terms to ask questions in natural language, and we were able to get as deep as we wanted, and can dive deeper still. For example, you can tell the agent to: Map this call chain into a sequence-diagram style summary showing request, SQL dependency, table write, and exception. Calculate the average request latency during the last 6 hours, split by client type, location and OS Find anomalies in the exceptions logged over the last 4 hours Create a time chart to show the top 3 anomalies How many users were impacted by each of the top 3 anomalies found? Break down the exception counts by request operation Launching a deep investigation Through the chat with the observability agent, you can also trigger a full, deep investigation process. A deep investigation doesn't handle just one question, but investigates an incident thoroughly - maps all related resources, identifies anomalies, performs correlations, analyzes root causes, and eventually provides a detailed report, including findings and recommendations. To start a deep investigation - select it from the suggestions provided during the conversation, or ask the agent explicitly, for example: run a deep investigation on the NullReferenceException anomaly. Final thought If observability used to start with queries – it now starts with a conversation. You can either guide the agent through the process you want to go through - or let it investigate on its own. Just ask. Stay connected Follow this blog for ongoing deep dives, updates on current capabilities, and a preview of what’s coming next. Check out our recent public preview update of the Azure Copilot Observability agent. Live webinar A walkthrough of real Observability agent scenarios, best practices, and what’s available today - along with a look at what’s coming next, and live Q&A with the product team. 👉 Register here We’d love your feedback The Observability agent continues to evolve based on real‑world usage and operator feedback. Share your thoughts directly through the Give Feedback option in the experience, or reach us at: azureobsagent@microsoft.com
Noa Kuperberg
Jun 15, 2026 Place Azure Observability Blog
766Views
1like
2Comments