APIM within Foundry

Question

Dear Azure AI Foundry team at Microsoft,

Please reconsider the current architecture and developer experience around AI observability and token analytics.

As it stands today, customers are expected to assemble an entire distributed system — APIM, Azure Functions, Static Web Apps, App Insights, Log Analytics, custom SSE parsing, and additional infrastructure — just to answer very basic operational questions:

Which users are consuming the most tokens?
Which models are being used the most?
What are our real-time streaming costs?
Which subscriptions/projects are generating spend?

Even worse, many of these solutions break down when using streamed/SSE AI responses because APIM policies are not designed to reliably process chunked AI streams and partial JSON bodies.

So customers end up building increasingly complicated middleware pipelines for functionality that should already exist natively inside the platform.

At the same time:

Azure clearly has access to token and billing telemetry internally
customers are still billed for usage
yet customers themselves are not given equivalent real-time visibility or tooling

That creates a frustrating disconnect, making it feel like a money grab when. It's like paying for groceries and not allowing customers to receive a receipt.

Another major issue is API key management. Providing effectively a single project-level credential for enterprise AI workloads creates operational and governance limitations that make multi-user auditing unnecessarily difficult. Why in the world, would the foundry team design this with only api key per project? Is there a secret reason for this, other than annoying customers?

To be blunt:
the current system design feels massively overengineered for customers while simultaneously underdelivering on the core metrics enterprises actually need.

AI platform teams should not need to build 10+ supporting Azure services just to approximate token analytics for a single Foundry project.

Azure has excellent infrastructure capabilities overall, which is exactly why this experience is so surprising.

But if the platform architecture and observability story for AI workloads do not improve soon, many organizations — including ours — will seriously evaluate moving to alternative cloud providers and AI gateway solutions that provide simpler and more transparent operational tooling.

Please prioritize:

native streaming token telemetry
first-class SSE observability
proper per-user/per-model analytics
better API credential management
simpler AI cost governance workflows

Right now, the operational overhead compared to the value delivered is far too high.

davidvoh · Answer

Also no instructions on how to do any of it. Example - spent 10 hours so far trying to configure an AI Gateway to OpenAI and get use it through an Agent - need to understand undocumented behind the scenes use of keyvault for the API key which is broken

Forum Discussion

APIM within Foundry

1 Reply