Forum Discussion
Turning “cool agent demos” into accountable systems – how are you doing this in Azure AI Foundry?
hi MartijnMuilwijk This is a great topic, and honestly one of the hardest parts of moving from agent demos to real value.
I’ve seen the exact same pattern you describe: teams can build something impressive in days, but the moment you say “this is going to production,” governance, security, finance, and audit all show up at once — usually with very reasonable questions that the demo never had to answer.
A few thoughts from what we’re seeing in Azure AI Foundry–based projects.
Governance: treating agents as first-class workloads (not experiments)
What’s worked best for us is explicitly classifying agents as applications, not AI experiments.
That means:
- Every agent has:
- A named business owner (not just a dev team)
- A documented purpose and non-goals (“what this agent must never do”)
- An approval path to go live
- The agent lifecycle (build → test → deploy → retire) is aligned with existing app governance, not a parallel “AI process”
In practice, we push governance into:
- IaC / policy-as-code (resource scopes, network rules, managed identities, tool access)
- Eval-as-code (baseline evaluations that must pass before promotion)
- Versioned configs (model, prompt, tools, data sources are all traceable)
Your litmus test resonates strongly. If we can’t reconstruct why an agent responded the way it did — model version, data source, tools invoked — we don’t call it production-ready.
FinOps: cost guardrails early, not after the bill shock
Agentic systems are deceptively expensive because:
- Tool chaining hides cost
- Retries and reasoning depth compound quickly
- “Just one more call” becomes the norm
A few patterns that helped:
- Per-agent budgets enforced at the platform level (not just advisory)
- Hard separation between:
- “Always-on” agents
- “Exploratory / burst” agents
- Cost telemetry aligned to business units of value:
- cost per ticket
- cost per document processed
- cost per workflow completion
Like you said, if we can’t explain an agent’s cost behavior to a finance leader in a couple of slides, it’s not ready.
Data, context, and lineage: where trust is won or lost
In real deployments, the model is rarely the biggest risk — data access is.
What’s been effective:
- Tying agents explicitly to data products or domains, not raw data stores
- Clear read/write contracts per agent
- Treating tool access as privileged operations (especially anything that mutates state)
We also log:
- Which data sources were consulted
- What tools were invoked
- What outputs were generated
That makes “Where did this answer come from?” answerable — not perfectly, but credibly.
GreenOps: emerging, but increasingly real
This is still early, but it’s coming up more often.
Initial steps we’ve seen:
- Scheduling non-urgent agent workloads
- Batching similar requests
- Being intentional about region selection (latency and energy efficiency)
- Avoiding always-on agents when event-driven works just as well
Most customers aren’t measuring energy impact yet, but they are starting to ask the question — which usually means it’ll become a requirement sooner rather than later.
What tends to go wrong (lessons learned)
A few recurring themes:
- Lab agents that had unrestricted data access and no one noticed until security review
- Costs that scaled linearly in testing but exponentially in production
- “Temporary” prompts and tools that became permanent without review
- Ownership gaps (“Who is actually responsible for this agent?”)
The biggest lesson: retro-fitting governance is much harder than designing for it.
A simple “minimum bar” before shipping an agent
What we increasingly insist on:
- Named owner and documented purpose
- Defined data access boundaries
- Basic evaluation and monitoring
- Cost visibility and limits
- Rollback plan
Nothing exotic — just enough structure so the agent survives contact with reality.
Thanks for starting this discussion. I’d love to see more shared checklists and reference architectures from the community — especially examples where things didn’t go as planned. That’s usually where the most learning happens.