Forum Discussion
Turning “cool agent demos” into accountable systems – how are you doing this in Azure AI Foundry?
Hi everyone,
I’m working with customers who are very excited about the new agentic capabilities in Azure AI Foundry (and the Microsoft Agent Framework). The pattern is always the same:
Building a cool agent demo is easy.
Turning it into an accountable, production-grade system that governance, FinOps, security and data people are happy with… not so much.
I’m curious how others are dealing with this in the real world, so here’s how I currently frame it with customers and I’d love to hear where you do things differently or better.
Governance: who owns the agent, and what does “safe enough” mean?
- For us, an agent is not “just another script”. It’s a proper application with:
- An owner (a real person, not a team name).
- A clear purpose and scope.
- A policy set (what it can and cannot do).
- A minimum set of controls (access, logging, approvals, evaluation, rollback).
In Azure AI Foundry terms: we try to push as much as possible into “as code” (config, infra, CI/CD) instead of burying it in PowerPoint and Word docs.
The litmus test I use: if this agent makes a bad decision in production, can we show – to audit or leadership – which data, tools, policies and model versions were involved? If the answer is “not really”, we’re not done.
FinOps: if you can’t cap it, you can’t scale it
Agentic solutions are fantastic at chaining calls and quietly generating cost.
We try to design with:
Explicit cost budgets per agent / per scenario.
A clear separation between “baseline” workloads and “burst / experimentation”.
Observability on cost per unit of value (per ticket, per document, per transaction, etc.).
Some of this maps nicely to existing cloud FinOps practices, some feels new because of LLM behaviour. My personal rule: I don’t want to ship an agent to production if I can’t explain its cost behaviour in 2–3 slides to a CFO.
Data, context and lineage: where most of the real risk lives
In my experience, most risk doesn’t come from the model, but from: Which data the agent can see.
How fresh and accurate that data is. Whether we can reconstruct the path from data → answer → decision.
We’re trying to anchor on:
Data products/domains as the main source of truth.
Clear contracts around what an agent is allowed to read or write.
Strong lineage for anything that ends up in front of a user or system of record.
From a user’s point of view, “Where did this answer come from?” is quickly becoming one of the most important questions.
GreenOps / sustainability: starting to show up in conversations
Some customers now explicitly ask:
“What is the energy impact of this AI workload?”
“Can we schedule, batch or aggregate work to reduce energy use and cost?”
So we’re starting to treat GreenOps as the “next layer” after cost: not just “is it cheap enough?”, but also “is it efficient and responsible enough?”.
What I’d love to learn from this community:
- In your Azure AI Foundry/agentic solutions, where do governance decisions actually live today?
Mostly in documentation and meetings, or do you already have patterns for policy-as-code / eval-as-code? - How are you bringing FinOps into the design of agents?
Do you have concrete cost KPIs per agent/scenario, or is it still “we’ll see what the bill says”? - How are you integrating data governance and lineage into your agent designs?
Are you explicitly tying agents to data products/domains with clear access rules? Any “red lines” for data they must never touch? - Has anyone here already formalised “GreenOps” thinking for AI Foundry workloads?
If yes, what did you actually implement (scheduling, consolidation, region choices, something else)? - And maybe the most useful bit: what went wrong for you so far?
Without naming customers, obviously. Any stories where a nice lab pattern didn’t survive contact with governance, security or operations?
I’m especially interested in concrete patterns, checklists or “this is the minimum we insist on before we ship an agent” criteria. Code examples are very welcome, but I’m mainly looking for the operating model and guardrails around the tech.
Thanks in advance for any insights, patterns or war stories you’re willing to share.