Blog Post

Startups at Microsoft
4 MIN READ

Monitoring Azure OpenAI without switching from your existing observability platform

rmmartins's avatar
rmmartins
Icon for Microsoft rankMicrosoft
Oct 03, 2025

Recently, one of my customers asked me a simple but powerful question:

“We already use Datadog for observability. We’re not going to move to Azure Monitor and Log Analytics, but can we still monitor our Azure OpenAI usage directly in the portal?”


That made me realize something: this isn’t just one customer’s challenge. Many customers already have an observability stack they like, but they still want quick, free visibility into Azure OpenAI usage and alerts without switching tools. This post shares how you can do exactly that.

The two monitoring paths

When it comes to monitoring Azure OpenAI, there are two main options:

1. The full flow (most powerful, requires Log Analytics): This unlocks correlation, deep queries, and exporting metrics/logs to external tools.

Azure OpenAI Service → Azure Monitor → Log Analytics → KQL, Workbooks, Alerts → integrations like Datadog, Grafana.

2. The lightweight flow (fast, free, no Log Analytics): This is what we’ll explore: simple dashboards and quota-based alerts right in the Azure Portal.

Azure OpenAI Service → Azure Monitor (Metrics) → Portal Workbooks + Alerts.

Metrics available in Azure OpenAI

Azure OpenAI publishes several key metrics natively (no ingestion required). According to the official documentation:

  • Processed Inference Tokens → tokens consumed (prompt + completion).
  • Azure OpenAI Requests → total API calls.
  • Request Errors → failed requests (429s, 5xx).
  • Availability Rate → percentage of successful calls.
  • Latency metrics → TTFT (time to first token), TTLB (time to last byte).

You can view these under: AOAI Resource → Monitoring → Metrics.

Azure OpenAI exposes native metrics like tokens, requests, errors, and latency directly in the Azure Portal

Quotas: The other half of the picture

Metrics tell you usage. Quotas tell you capacity. Every deployment has fixed Tokens per Minute (TPM) and Requests per Minute (RPM) limits. You can find these under: AOAI Foundry Portal → Deployments → Select Deployment → Rate Limits.

Example:

  • GPT-4.1-mini deployment → 250,000 TPM / 250 RPM

These are the values you’ll compare against metrics and use in alerts.

Each deployment has fixed TPM/RPM quotas. Here, GPT-4.1-mini is capped at 250,000 TPM and 250 RPM.

Building a lightweight workbook

Even without Log Analytics, you can build a simple workbook to track usage vs quota:

  1. Go to Azure Monitor → Workbooks → + New.
  2. Add a metric visualization for Processed Inference Tokens (Sum).
    • Resource Type: Azure AI Foundry
    • Azure AI Foundry: Select your instance
    • Click to add metric
      • Metric: Processed Inference Tokens
      • Aggregation: Sum
      • Display name: Token Usage vs Quota.
  3. Add another metric for Azure OpenAI Requests (Count).
    • Metric: Azure OpenAI Requests
    • Aggregation: Count
    • Display name: Requests per Minute vs Quota.
  4. Click to Run Metrics
  5. Save as AOAI Usage vs Capacity.

Workbooks let you visualize token and request usage against your deployment’s fixed quotas

Creating alerts (proactive notification)

From the portal you can also configure alerts directly on metrics:

  1. Go to Azure Monitor → Alerts → + Create → Alert rule.
  2. Scope = your AOAI resource.
  3. Condition step:
    • Signal name = Processed Inference Tokens.
    • Threshold type: Static
    • Value is: Greater than
    • Unit: Count
    • Threshold = 200,000 (warning) or 250,000 (critical).
  4. Actions step:
    • Use Quick Actions → add your email (or Azure mobile push).
    • Or create an Action Group for Teams/webhook integration.
  5. Details step:
    • Name = AOAI-TPM-Warning / AOAI-TPM-Critical.
    • Severity = 2 (Warning) or 0 (Critical).
  6. Review + Create.
  7. Repeat for Azure OpenAI Requests with thresholds of 200 (warning) and 250 (critical).

Alert conditions:

Configure alert conditions directly on metrics. Here, we trigger at 200,000 tokens per minute (80% of quota)

Quick Actions:

Quick Actions let you add email or mobile notifications without creating a full Action Group.

Overview from the Alert:

Give your alert a descriptive name and severity. Here, AOAI-TPM-Warning at Severity 2.

How this helps with 429 errors

One of the most common issues Azure OpenAI customers face is the dreaded “Too Many Requests” (429) error.

Why it happens:

  • Each deployment enforces hard TPM/RPM quotas.
  • If you send more tokens or requests than allowed in a minute, the service rejects them with a 429.
  • You may see headers like x-ms-retry-after-ms telling you how long to wait.

How monitoring helps:

  • Metrics as early warning: Watching token/request metrics shows when you’re approaching the cap.
  • Alerts before throttling: Warning alerts at 80% (200k TPM / 200 RPM) give you time to react before 429s hit.
  • Critical alerts at 100%: Confirm you’ve saturated the quota and need to adjust.

Important note:

  • Monitoring doesn’t prevent 429s, your app should still implement retry with backoff and consider batching/queuing requests.
  • But with this setup, you’ll know before the error storm begins, and can respond faster.

Why this matters

For many companies, time-to-value is more important than building a new monitoring stack.

This approach means:

  • No Log Analytics ingestion.
  • No need to replace Datadog or Splunk.
  • Free visibility into usage vs quota.
  • Proactive notifications on approaching limits.
  • Fewer surprises with 429 errors.

And if later you want deeper insights, you can still enable Log Analytics and export into your existing observability platform.

Closing thoughts

This article was inspired by a customer request, but I believe many others will benefit from the same approach. In just a few minutes, you can build a dashboard, set alerts, and gain confidence in your Azure OpenAI usage, all without leaving the Azure Portal.
I’d love to hear from you: how is your team monitoring Azure OpenAI today? Share in the comments, your feedback will help shape what we build next.

Updated Oct 03, 2025
Version 2.0
No CommentsBe the first to comment