đź§ Introduction
Monitoring AI applications is no longer just about uptime and errors. With generative AI workloads, we need to understand how well our AI is performing, how much it costs, and whether it’s safe and responsible.
Azure AI Foundry provides integrated observability tools that connect Application Insights, continuous evaluation, and custom dashboards — giving you end-to-end visibility into your AI application. This blog is also aligned with Azure baseline monitoring alerts.
By combining Azure Monitor’s baseline alerting framework with AI Foundry’s built-in metrics and telemetry, teams can achieve proactive monitoring aligned with Microsoft’s Azure Baseline Monitoring Alerts.
đź’ˇ Why Monitoring Matters for AI Applications
| Area | Traditional Apps | Generative AI Apps |
|---|---|---|
| Performance | API latency, failures, uptime | Response time per model call, model version performance |
| Cost | Compute, storage | Token usage & model inference cost |
| Quality | Functional correctness | Response quality, helpfulness, accuracy |
| Safety | Security vulnerabilities | Harmful output detection, policy violations |
| Drift | Version mismatches | Model drift, prompt sensitivity changes |
🎯 Generative AI monitoring = traditional metrics + new dimensions of intelligence, quality, and safety.
đź§© Capabilities of Azure AI Foundry for Monitoring
Azure AI Foundry provides a layered approach for observability:
| Capability | Description | Purpose |
|---|---|---|
| Application Insights Integration | Connects your project to Azure Monitor for request, trace, and exception telemetry. | Core telemetry collection |
| Built-in Application Analytics Dashboard | Visualizes latency, token usage, exception rate, and response quality. | Quick operational visibility |
| Continuous Evaluation for Agents | Automatically evaluates AI responses for quality, safety, and accuracy. | Quality monitoring |
| Kusto (KQL) Query Access | Drill down into raw telemetry data using queries. | Deep analysis |
| Alerts via Azure Monitor | Create rules that trigger based on KQL results (e.g., latency > 2 s). | Proactive issue detection |
| Customizable Workbooks | Modify dashboards or create your own visualizations. | Team-specific insights |
Visual Overview of Monitoring Architecture:
Azure AI Foundry gives you a complete monitoring framework for your AI workloads — combining traditional performance metrics with AI-specific signals like token usage, output quality, and safety.
🧩 Step 1 — Connect Azure AI Foundry to Application Insights
Azure AI Foundry does not store monitoring data by itself — it integrates with Azure Monitor Application Insights, which is part of the broader Azure Monitor ecosystem.
This integration enables your AI applications to automatically send logs, traces, exceptions, and custom metrics such as latency, token consumption, and evaluation scores.
đź”§ How to Connect
- Open your AI Foundry project
- Sign in to Azure AI Foundry portal and navigate to your project.
- Navigate to Monitoring
- In the left-side panel, click Monitoring → Application Analytics.
- Connect an Application Insights resource
- If you already have one in your subscription, select it from the dropdown.
- Otherwise, click Create new Application Insights resource.
- Choose a name, resource group, and region (ideally in the same region as your AI Foundry project for latency efficiency).
- Confirm the connection
- Once connected, telemetry from your AI applications will begin flowing automatically.
- You can verify this in a few minutes by opening the Application Analytics dashboard or by checking your Application Insights → Logs section.
⚙️ Step 2 — Instrument Your AI Application Code
Connecting the resource enables telemetry, but to get meaningful data (e.g., token usage, quality scores, and latency), you must instrument your application code.
Example (Python SDK)
from azure.monitor.opentelemetry import configure_azure_monitor
# Configure Application Insights telemetry
configure_azure_monitor(connection_string="InstrumentationKey=<YOUR_KEY>")
# Example: Log token usage or model metadata
logger.info("AI Call", extra={
"customDimensions": {
"model_version": "gpt-4",
"prompt_type": "customer_support",
"input_tokens": 180,
"output_tokens": 256
}
})
Add custom dimensions like model_version, prompt_type, or user_segment — they make your dashboards far more insightful.
📊 Step 3 — Explore the Application Analytics Dashboard
Once telemetry flows, open your project → Monitoring → Application Analytics.
You’ll see charts for:
- Latency (average response time per request)
- Token usage (input/output tokens per call)
- Failure rate and exceptions
- Quality score (if continuous evaluation is enabled)
How to Use It
- Use the time range filter (top-right) to isolate recent runs.
- Click on any tile → “Open Query” to see the underlying KQL query.
- Clone or edit the dashboard to add your own metrics.
🔍 Step 4 — Analyze Data Using Kusto (KQL)
For deeper diagnostics, open Application Insights → Logs and write KQL queries to analyze data.
Common Use Cases
| Goal | Sample Query |
|---|---|
| Identify top slow endpoints | `requests |
| Track token usage over time | `customMetrics |
| Monitor safety scores | `customMetrics |
🚨 Step 5 — Set Alerts for Critical Events
Don’t wait to find out something went wrong — configure alerts in Azure Monitor.
Create an Alert
- Go to your Application Insights resource.
- Select Alerts → Create → Alert rule.
- Use a KQL query or a built-in metric condition, such as:
- Latency > 3 s
- Error rate > 5%
- Token usage > 500K per hour
- Attach an Action Group to notify via email, Microsoft Teams, or webhook.
Flow showing App Insights → Alert → Teams Notification → DevOps Action.
🧮 Step 6 — Optimize & Review Regularly
Monitoring isn’t “set and forget.” Review metrics weekly or monthly to ensure:
- Costs (token consumption) remain within budget.
- Quality scores meet or exceed expectations.
- Latency remains consistent across model versions.
- Safety metrics show no rise in policy violations.
Example Review Table
| Metric | Target | Actual | Trend | Action |
|---|---|---|---|---|
| Latency (ms) | < 2500 ms | 3100 ms | ⬆️ Increasing | Optimize prompt size |
| Quality Score | > 0.9 | 0.92 | ➡️ Stable | ✅ |
| Token Usage (per hour) | < 400K | 480K | ⬆️ Increasing | Review model selection |
| Safety Violations | < 1% | 0.8% | ➡️ Stable | ✅ |
🧠Step 7 — Close the Feedback Loop (MLOps Integration)
Finally, connect your monitoring insights back into your MLOps workflow:
- When quality drops, trigger re-evaluation or fine-tuning pipelines.
- When cost spikes, switch to a lighter model (e.g., GPT-4o mini).
- When safety issues arise, automatically disable affected agents until verified.
Monitoring → Alert → Azure DevOps Pipeline / Retraining Workflow → Model Update → Back to Monitoring