Microsoft Foundry Blog

4 MIN READ

Monitoring Generative AI Applications with Azure AI Foundry

Microsoft

Nov 07, 2025

🧭 Introduction

Monitoring AI applications is no longer just about uptime and errors. With generative AI workloads, we need to understand how well our AI is performing, how much it costs, and whether it’s safe and responsible.

Azure AI Foundry provides integrated observability tools that connect Application Insights, continuous evaluation, and custom dashboards — giving you end-to-end visibility into your AI application. This blog is also aligned with Azure baseline monitoring alerts.

By combining Azure Monitor’s baseline alerting framework with AI Foundry’s built-in metrics and telemetry, teams can achieve proactive monitoring aligned with Microsoft’s Azure Baseline Monitoring Alerts.

💡 Why Monitoring Matters for AI Applications

Area	Traditional Apps	Generative AI Apps
Performance	API latency, failures, uptime	Response time per model call, model version performance
Cost	Compute, storage	Token usage & model inference cost
Quality	Functional correctness	Response quality, helpfulness, accuracy
Safety	Security vulnerabilities	Harmful output detection, policy violations
Drift	Version mismatches	Model drift, prompt sensitivity changes

🎯 Generative AI monitoring = traditional metrics + new dimensions of intelligence, quality, and safety.

🧩 Capabilities of Azure AI Foundry for Monitoring

Azure AI Foundry provides a layered approach for observability:

Capability	Description	Purpose
Application Insights Integration	Connects your project to Azure Monitor for request, trace, and exception telemetry.	Core telemetry collection
Built-in Application Analytics Dashboard	Visualizes latency, token usage, exception rate, and response quality.	Quick operational visibility
Continuous Evaluation for Agents	Automatically evaluates AI responses for quality, safety, and accuracy.	Quality monitoring
Kusto (KQL) Query Access	Drill down into raw telemetry data using queries.	Deep analysis
Alerts via Azure Monitor	Create rules that trigger based on KQL results (e.g., latency > 2 s).	Proactive issue detection
Customizable Workbooks	Modify dashboards or create your own visualizations.	Team-specific insights

Visual Overview of Monitoring Architecture:

Azure AI Foundry gives you a complete monitoring framework for your AI workloads — combining traditional performance metrics with AI-specific signals like token usage, output quality, and safety.

🧩 Step 1 — Connect Azure AI Foundry to Application Insights

Azure AI Foundry does not store monitoring data by itself — it integrates with Azure Monitor Application Insights, which is part of the broader Azure Monitor ecosystem.

This integration enables your AI applications to automatically send logs, traces, exceptions, and custom metrics such as latency, token consumption, and evaluation scores.

🔧 How to Connect

Open your AI Foundry project
- Sign in to Azure AI Foundry portal and navigate to your project.
Navigate to Monitoring
- In the left-side panel, click Monitoring → Application Analytics.
Connect an Application Insights resource
- If you already have one in your subscription, select it from the dropdown.
- Otherwise, click Create new Application Insights resource.
- Choose a name, resource group, and region (ideally in the same region as your AI Foundry project for latency efficiency).
Confirm the connection
- Once connected, telemetry from your AI applications will begin flowing automatically.
- You can verify this in a few minutes by opening the Application Analytics dashboard or by checking your Application Insights → Logs section.

⚙️ Step 2 — Instrument Your AI Application Code

Connecting the resource enables telemetry, but to get meaningful data (e.g., token usage, quality scores, and latency), you must instrument your application code.

Example (Python SDK)

from azure.monitor.opentelemetry import configure_azure_monitor

# Configure Application Insights telemetry
configure_azure_monitor(connection_string="InstrumentationKey=<YOUR_KEY>")

# Example: Log token usage or model metadata
logger.info("AI Call", extra={
    "customDimensions": {
        "model_version": "gpt-4",
        "prompt_type": "customer_support",
        "input_tokens": 180,
        "output_tokens": 256
    }
})

Add custom dimensions like model_version, prompt_type, or user_segment — they make your dashboards far more insightful.

📊 Step 3 — Explore the Application Analytics Dashboard

Once telemetry flows, open your project → Monitoring → Application Analytics.
You’ll see charts for:

Latency (average response time per request)
Token usage (input/output tokens per call)
Failure rate and exceptions
Quality score (if continuous evaluation is enabled)

How to Use It

Use the time range filter (top-right) to isolate recent runs.
Click on any tile → “Open Query” to see the underlying KQL query.
Clone or edit the dashboard to add your own metrics.

🔍 Step 4 — Analyze Data Using Kusto (KQL)

For deeper diagnostics, open Application Insights → Logs and write KQL queries to analyze data.

Common Use Cases

Goal	Sample Query
Identify top slow endpoints	`requests
Track token usage over time	`customMetrics
Monitor safety scores	`customMetrics

🚨 Step 5 — Set Alerts for Critical Events

Don’t wait to find out something went wrong — configure alerts in Azure Monitor.

Create an Alert

Go to your Application Insights resource.
Select Alerts → Create → Alert rule.
Use a KQL query or a built-in metric condition, such as:
- Latency > 3 s
- Error rate > 5%
- Token usage > 500K per hour
Attach an Action Group to notify via email, Microsoft Teams, or webhook.

Flow showing App Insights → Alert → Teams Notification → DevOps Action.

🧮 Step 6 — Optimize & Review Regularly

Monitoring isn’t “set and forget.” Review metrics weekly or monthly to ensure:

Costs (token consumption) remain within budget.
Quality scores meet or exceed expectations.
Latency remains consistent across model versions.
Safety metrics show no rise in policy violations.

Example Review Table

Metric	Target	Actual	Trend	Action
Latency (ms)	< 2500 ms	3100 ms	⬆️ Increasing	Optimize prompt size
Quality Score	> 0.9	0.92	➡️ Stable	✅
Token Usage (per hour)	< 400K	480K	⬆️ Increasing	Review model selection
Safety Violations	< 1%	0.8%	➡️ Stable	✅

🧭 Step 7 — Close the Feedback Loop (MLOps Integration)

Finally, connect your monitoring insights back into your MLOps workflow:

When quality drops, trigger re-evaluation or fine-tuning pipelines.
When cost spikes, switch to a lighter model (e.g., GPT-4o mini).
When safety issues arise, automatically disable affected agents until verified.

Monitoring → Alert → Azure DevOps Pipeline / Retraining Workflow → Model Update → Back to Monitoring

Updated Nov 04, 2025

Version 1.0

RRAJMSFT

Microsoft

Joined April 09, 2024

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity