Forum Discussion
APIM within Foundry
Dear Azure AI Foundry team at Microsoft,
Please reconsider the current architecture and developer experience around AI observability and token analytics.
As it stands today, customers are expected to assemble an entire distributed system — APIM, Azure Functions, Static Web Apps, App Insights, Log Analytics, custom SSE parsing, and additional infrastructure — just to answer very basic operational questions:
- Which users are consuming the most tokens?
- Which models are being used the most?
- What are our real-time streaming costs?
- Which subscriptions/projects are generating spend?
Even worse, many of these solutions break down when using streamed/SSE AI responses because APIM policies are not designed to reliably process chunked AI streams and partial JSON bodies.
So customers end up building increasingly complicated middleware pipelines for functionality that should already exist natively inside the platform.
At the same time:
- Azure clearly has access to token and billing telemetry internally
- customers are still billed for usage
- yet customers themselves are not given equivalent real-time visibility or tooling
That creates a frustrating disconnect, making it feel like a money grab when. It's like paying for groceries and not allowing customers to receive a receipt.
Another major issue is API key management. Providing effectively a single project-level credential for enterprise AI workloads creates operational and governance limitations that make multi-user auditing unnecessarily difficult. Why in the world, would the foundry team design this with only api key per project? Is there a secret reason for this, other than annoying customers?
To be blunt:
the current system design feels massively overengineered for customers while simultaneously underdelivering on the core metrics enterprises actually need.
AI platform teams should not need to build 10+ supporting Azure services just to approximate token analytics for a single Foundry project.
Azure has excellent infrastructure capabilities overall, which is exactly why this experience is so surprising.
But if the platform architecture and observability story for AI workloads do not improve soon, many organizations — including ours — will seriously evaluate moving to alternative cloud providers and AI gateway solutions that provide simpler and more transparent operational tooling.
Please prioritize:
- native streaming token telemetry
- first-class SSE observability
- proper per-user/per-model analytics
- better API credential management
- simpler AI cost governance workflows
Right now, the operational overhead compared to the value delivered is far too high.