Blog Post

Azure AI Foundry Blog
4 MIN READ

Monitoring Generative AI Applications with Azure AI Foundry

RRAJMSFT's avatar
RRAJMSFT
Icon for Microsoft rankMicrosoft
Nov 07, 2025

đź§­ Introduction

Monitoring AI applications is no longer just about uptime and errors. With generative AI workloads, we need to understand how well our AI is performing, how much it costs, and whether it’s safe and responsible.

Azure AI Foundry provides integrated observability tools that connect Application Insights, continuous evaluation, and custom dashboards — giving you end-to-end visibility into your AI application. This blog is also aligned with Azure baseline monitoring alerts.

By combining Azure Monitor’s baseline alerting framework with AI Foundry’s built-in metrics and telemetry, teams can achieve proactive monitoring aligned with Microsoft’s Azure Baseline Monitoring Alerts.

đź’ˇ Why Monitoring Matters for AI Applications

AreaTraditional AppsGenerative AI Apps
PerformanceAPI latency, failures, uptimeResponse time per model call, model version performance
CostCompute, storageToken usage & model inference cost
QualityFunctional correctnessResponse quality, helpfulness, accuracy
SafetySecurity vulnerabilitiesHarmful output detection, policy violations
DriftVersion mismatchesModel drift, prompt sensitivity changes

🎯 Generative AI monitoring = traditional metrics + new dimensions of intelligence, quality, and safety.

đź§© Capabilities of Azure AI Foundry for Monitoring

Azure AI Foundry provides a layered approach for observability:

CapabilityDescriptionPurpose
Application Insights IntegrationConnects your project to Azure Monitor for request, trace, and exception telemetry.Core telemetry collection
Built-in Application Analytics DashboardVisualizes latency, token usage, exception rate, and response quality.Quick operational visibility
Continuous Evaluation for AgentsAutomatically evaluates AI responses for quality, safety, and accuracy.Quality monitoring
Kusto (KQL) Query AccessDrill down into raw telemetry data using queries.Deep analysis
Alerts via Azure MonitorCreate rules that trigger based on KQL results (e.g., latency > 2 s).Proactive issue detection
Customizable WorkbooksModify dashboards or create your own visualizations.Team-specific insights

Visual Overview of Monitoring Architecture:

 

 

 

 

 

 

 

 

 

 

Azure AI Foundry gives you a complete monitoring framework for your AI workloads — combining traditional performance metrics with AI-specific signals like token usage, output quality, and safety.

🧩 Step 1 — Connect Azure AI Foundry to Application Insights

Azure AI Foundry does not store monitoring data by itself — it integrates with Azure Monitor Application Insights, which is part of the broader Azure Monitor ecosystem.

This integration enables your AI applications to automatically send logs, traces, exceptions, and custom metrics such as latency, token consumption, and evaluation scores.

đź”§ How to Connect

  1. Open your AI Foundry project
  2. Navigate to Monitoring
    • In the left-side panel, click Monitoring → Application Analytics.
  3. Connect an Application Insights resource
    • If you already have one in your subscription, select it from the dropdown.
    • Otherwise, click Create new Application Insights resource.
    • Choose a name, resource group, and region (ideally in the same region as your AI Foundry project for latency efficiency).
  4. Confirm the connection
    • Once connected, telemetry from your AI applications will begin flowing automatically.
    • You can verify this in a few minutes by opening the Application Analytics dashboard or by checking your Application Insights → Logs section.

 

 

 

 

 

 

⚙️ Step 2 — Instrument Your AI Application Code

Connecting the resource enables telemetry, but to get meaningful data (e.g., token usage, quality scores, and latency), you must instrument your application code.

Example (Python SDK)

from azure.monitor.opentelemetry import configure_azure_monitor

# Configure Application Insights telemetry
configure_azure_monitor(connection_string="InstrumentationKey=<YOUR_KEY>")

# Example: Log token usage or model metadata
logger.info("AI Call", extra={
    "customDimensions": {
        "model_version": "gpt-4",
        "prompt_type": "customer_support",
        "input_tokens": 180,
        "output_tokens": 256
    }
})

Add custom dimensions like model_version, prompt_type, or user_segment — they make your dashboards far more insightful.

📊 Step 3 — Explore the Application Analytics Dashboard

Once telemetry flows, open your project → Monitoring → Application Analytics.
You’ll see charts for:

  • Latency (average response time per request)
  • Token usage (input/output tokens per call)
  • Failure rate and exceptions
  • Quality score (if continuous evaluation is enabled)

How to Use It

  • Use the time range filter (top-right) to isolate recent runs.
  • Click on any tile → “Open Query” to see the underlying KQL query.
  • Clone or edit the dashboard to add your own metrics.

🔍 Step 4 — Analyze Data Using Kusto (KQL)

For deeper diagnostics, open Application Insights → Logs and write KQL queries to analyze data.

Common Use Cases

GoalSample Query
Identify top slow endpoints`requests
Track token usage over time`customMetrics
Monitor safety scores`customMetrics

🚨 Step 5 — Set Alerts for Critical Events

Don’t wait to find out something went wrong — configure alerts in Azure Monitor.

Create an Alert

  1. Go to your Application Insights resource.
  2. Select Alerts → Create → Alert rule.
  3. Use a KQL query or a built-in metric condition, such as:
    • Latency > 3 s
    • Error rate > 5%
    • Token usage > 500K per hour
  4. Attach an Action Group to notify via email, Microsoft Teams, or webhook.

Flow showing App Insights → Alert → Teams Notification → DevOps Action.

🧮 Step 6 — Optimize & Review Regularly

Monitoring isn’t “set and forget.” Review metrics weekly or monthly to ensure:

  • Costs (token consumption) remain within budget.
  • Quality scores meet or exceed expectations.
  • Latency remains consistent across model versions.
  • Safety metrics show no rise in policy violations.

Example Review Table

MetricTargetActualTrendAction
Latency (ms)< 2500 ms3100 ms⬆️ IncreasingOptimize prompt size
Quality Score> 0.90.92➡️ Stable✅
Token Usage (per hour)< 400K480K⬆️ IncreasingReview model selection
Safety Violations< 1%0.8%➡️ Stable✅

🧭 Step 7 — Close the Feedback Loop (MLOps Integration)

Finally, connect your monitoring insights back into your MLOps workflow:

  • When quality drops, trigger re-evaluation or fine-tuning pipelines.
  • When cost spikes, switch to a lighter model (e.g., GPT-4o mini).
  • When safety issues arise, automatically disable affected agents until verified.

Monitoring → Alert → Azure DevOps Pipeline / Retraining Workflow → Model Update → Back to Monitoring

 

Updated Nov 04, 2025
Version 1.0
No CommentsBe the first to comment