azure ai services
577 TopicsNeed Guidance on cost breakdown of Microsoft Foundry Agent portal I created
I have developed a complaint handling portal for customers and employees using Azure AI Foundry. The solution is built with Foundry agents, models from the catalog, input/output caching, agent logging/tracing, and other Foundry capabilities. The frontend and orchestration layer are deployed on Azure Container Apps. While Azure Cost Analysis provides an overview of spending, several parts remain unclear or act as a black box for accurate estimation, including: Token consumption assumptions (input/output tokens across different models and agents) User concurrency, sessions, and behavior patterns Agent logging and observability costs Impact of input/output caching Detailed resource consumption and billing in Azure Container Apps What is the best way to accurately calculate or estimate the total running cost for such an Azure AI Foundry-based platform with Container Apps frontend? Are there official Microsoft documentation, pricing guides, or reference architectures for cost breakdown? How do companies typically present costs for such AI platforms to attract customers (e.g., TCO models or per-user pricing)? I want to know how the platform costs are shown to customers. Thank you.Power Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.2KViews3likes2CommentsThe Business Foundation: Why Most Companies Aren’t Ready for Agentic AI
Before agents can execute decisions, organizations must redesign how they structure responsibility, data, governance, and operational context before autonomy can scale. The enterprise AI landscape has shifted. Organizations are moving beyond chatbots and isolated predictive models toward systems that can plan, decide, and execute multi-step work across finance, engineering operations, supply chains, and customer service. Many analysts now expect agentic AI to unlock major productivity gains across knowledge work. But despite the momentum, adoption remains limited. As of 2025, only about 2% of organizations have deployed agent-based systems at real operational scale, while most remain stuck in pilots. The reason is not model capability. It is readiness. The Core Problem Most organizations still treat AI adoption as a technical rollout exercise and measure progress through deployment indicators such as copilots enabled, pilots launched, or models evaluated. These metrics reflect experimentation activity, but they do not show whether an organization is ready to operate systems that make decisions and execute actions inside business workflows. Agentic systems do more than generate insights; they participate directly in operational processes. The gap between deploying AI tools and safely delegating decision-making authority to them is where many transformation efforts begin to stall. True enterprise readiness for agentic AI is not defined by how many models an organization deploys or how many pilots it launches. It depends on whether the organization can safely delegate bounded decisions to autonomous systems. In practice, this requires: Strategy and decision scoping: identifying where autonomous execution creates value and where human oversight must remain in place Process and decision-system maturity: redesigning workflows for human-agent collaboration with clear escalation boundaries Context-ready data foundations: ensuring agents operate on consistent, policy-aware operational context rather than fragmented data silos Governance and accountability structures: defining what agents may recommend, execute, escalate, or never touch, supported by auditability and oversight Team readiness and lifecycle management: preparing teams to supervise autonomous execution and managing agents as ongoing operational participants rather than static tools Coordination architecture readiness: aligning multiple agents across domains so local optimization does not create organizational conflict This article explains why traditional enterprise environments are not yet prepared for autonomous agents, what true agentic readiness actually looks like in practice, and the sequence of organizational changes required before decision-capable systems can be deployed safely at scale. I. The Readiness Illusion and the Root Causes of Failure Most organizations are deploying agentic systems into environments designed exclusively for human execution. That mismatch produces predictable friction across five structural layers. 1. Fragmented Operational Context (The Data Problem) Enterprises have a lot of data. What they often lack is usable context. Traditional systems record what happened. Agents also need to understand why something happened, how systems are connected, and where policy limits apply. In most organizations, customer systems, telemetry platforms, identity services, and finance tools do not stay aligned in real time. As a result, agents operate across disconnected information rather than a shared operational picture. This creates real risk. With generative AI, poor data quality usually produces a weak answer. With agentic AI, poor data quality can produce the wrong action at scale. More APIs, more pipelines, and more dashboards do not fix this by themselves. Without a shared semantic context across systems, agents can still make decisions that are internally logical but operationally wrong. For example, an agent may see that a customer received a large discount and conclude that future discounts should be limited, while missing that the original discount was approved because of a service outage and a retention risk. The data is available, but the business meaning behind it is not. 2. Undocumented Decision Systems Most organizations document workflows. However, very few document decision authority clearly enough for autonomous execution. Agents need to know where they are allowed to act, when they must escalate, and which decisions remain human-only. Without these boundaries, organizations often follow the same pattern: the first unexpected situation appears, confidence drops, and the agent is switched off. This is not a model problem. It is a decision-structure problem. Before deploying agents, organizations must be able to explain which decisions can be delegated and who remains responsible for each step. Many cannot yet do this. 3. The Governance Paradox Agentic systems do not fit traditional governance models. Most organizations still assume a simple structure: user → application → resource Agent-based systems introduce a new layer: user → agent → tools → resource This change affects access control, compliance processes, and audit visibility. Organizations usually buy agents like software tools but must manage them more like team members. That gap is rarely addressed before deployment begins. This issue is already visible today. Many enterprises are using vendor copilots and embedded AI features inside business systems without clear ownership, audit coverage, or governance rules. This creates a growing “shadow AI” layer even before intentional agent programs start. 4. Identity and Accountability Ambiguity Many organizations cannot clearly answer a simple question: who is responsible when an agent makes a mistake? In practice, agents often receive permissions that are broader than necessary, execution traces are difficult to follow across multiple systems, and accountability is split between IT, compliance, and business teams. Without clear attribution, autonomy introduces hidden risk instead of efficiency. Delegation without accountability is not automation. It is unmanaged risk. 5. Organizational Misalignment Most transformation programs still assume employees will use AI as a tool. Agentic environments change the role of employees from operators to supervisors. People are expected to review outcomes, guide behavior, and manage exceptions instead of executing every step themselves. Research from BCG shows that around 70% of AI project challenges come from people and process issues rather than technology. Organizations that invest in change management are significantly more likely to see successful results. Organizational readiness is not something to address later. It is required before agents can operate safely. Common Failure Patterns at a Glance Common failure patterns like these are already visible in real deployments. The Klarna case illustrates the challenge well. After replacing several hundred customer service roles with AI, the company later reported lower resolution quality for complex cases, declining satisfaction scores, and higher escalation rates, which led to renewed hiring in support roles. The outcome did not point to a failure of the model itself. It highlighted what happens when autonomous systems are introduced without the supporting process, governance, and team structures required for sustained operation. II. Defining True Agentic Readiness Agentic readiness is not just about having the right tools in place. It is about whether the organization has the capability to use autonomous systems safely and effectively. Definition Agentic readiness is the ability to safely delegate bounded operational decisions to autonomous systems while maintaining accountability, observability, and policy alignment across the full execution chain. Research consistently shows that organizations benefit from AI only when multiple maturity layers advance together. The MIT CISR AI Maturity Model, based on data from 721 companies, demonstrates that financial performance improves as organizations progress through the stages. Companies in early stages often perform below industry averages, while those reaching later stages perform significantly better. The key insight is that maturity is cumulative. Organizations cannot skip foundational steps and still expect reliable outcomes. For agentic systems, those cumulative layers include strategy alignment, decision-ready processes, context-ready data, governance structures, organizational roles, and technical architecture. When only some of these elements are in place, organizations produce pilots. When they advance together, organizations produce transformation. From Activity Metrics to Outcome Metrics One of the clearest signs of readiness is how an organization measures progress. Organizations at an early stage usually focus on activity: Number of models deployed Pilots launched Features enabled User onboarding numbers and API call volume More mature organizations focus on outcomes: Better decision quality and fewer errors Higher throughput for clearly defined tasks Consistent operation within safe autonomy boundaries Complete audit trails and accurate escalation handling This is not a semantic distinction. Organizations measuring activity invest indefinitely in pilots because they have no signal telling them a pilot has succeeded or failed. The measurement framework is itself a prerequisite for the transformation sequence. III. The Transformation Sequence Most Organizations Skip Many organizations begin agent adoption in the wrong order. Platforms are procured before governance is defined. Models are evaluated before workflows are structured. Autonomy is introduced before decision authority is mapped. The result is not faster progress. It is earlier failure, followed by expensive cleanup later. In traditional cloud transformation, architecture precedes automation. Agentic transformation follows the same rule: decision structure must exist before delegation can scale. Step 1: Strategic Alignment and Decision Scoping Organizations should begin by identifying where autonomy creates value safely — not where it is technically possible and not where ambitions are highest. Strong early candidates usually share the same characteristics: structured decisions, bounded scope, reversible actions, and high execution frequency. Typical examples include incident triage and routing, capacity classification, environment status updates, and prioritization support. These are good starting points not because they are simple, but because failures are visible, recoverable, and useful for learning. Delegation should grow gradually from bounded decision spaces toward broader authority. Organizations that struggle often start with highly visible, high-risk use cases because the business case looks attractive. Organizations that succeed usually begin with frequent, lower-impact decisions where feedback loops are short and improvements can happen quickly. Step 2: Process Maturity and Boundary Setting Agents do not fix broken workflows. They execute them faster. If a process depends on informal judgment, tribal knowledge, or undocumented exception handling, an agent will reproduce those weaknesses at machine speed. Before introducing autonomy, organizations should establish structured runbooks with clear execution paths, explicit escalation logic an agent can evaluate, defined exception-handling rules that do not rely on intuition, and clear boundaries between decisions an agent may take and those that must remain with humans. This level of discipline requires documentation precision that many organizations have never needed before. A statement such as “the engineer uses judgment” is not a runbook step. It is an undocumented dependency that will later appear as an agent failure. This is also where leaders face a practical choice: add agents on top of fragile legacy workflows, or redesign those workflows so delegation can happen safely. In many cases, the second path is slower at first but far more durable. Step 3: Data Context and Decision Awareness Agents cannot operate reliably in fragmented environments. The solution is not simply collecting more data. What they require is decision-aware context: structured knowledge about relationships between systems, service dependencies, environment classification, policy boundaries, and operational intent. This is a different challenge from building analytics platforms. Analytics depends on broad visibility across large datasets. Agentic execution depends on precise, current, and consistent information at the moment a decision is made. A customer record that is accurate enough for reporting may not be reliable enough for an agent executing a contract action. Because of this difference, data readiness becomes a leadership concern rather than only an infrastructure task. Microsoft’s digital transformation guidance captures this clearly with the principle “no AI without data”: organizations should identify critical data sources, establish governance ownership, improve quality, and define controlled access before introducing agents into operational workflows. Step 4: Governance and Delegation Redesign Organizations must explicitly define four categories of agent authority before deployment: What agents may recommend (advisory boundary) What agents may execute autonomously (execution boundary) What requires human approval before execution (escalation boundary) What remains permanently restricted regardless of confidence (prohibition boundary) These policies cannot remain static. Agentic systems require continuous supervision, not periodic review. Research supports this shift. Studies of governance professionals working with autonomous systems show that adopting traditional Enterprise Risk Management frameworks alone does not significantly reduce governance incidents. What makes the difference is integrating human oversight into execution loops and strengthening machine identity security. In practice, this means organizations need a delegated-autonomy governance function: a cross-functional group with representation from IT, compliance, legal, and business teams that continuously defines and monitors the boundaries of agent behavior. This is different from extending existing approval committees. Governance must move from acting as a gate before deployment to operating as a supervision layer throughout the lifecycle of the agent. This creates a basic operational tension: organizations adopt agents to reduce manual work, but safe autonomy requires stronger supervision, better observability, and tighter control over identity and permissions — especially in the early stages. Step 5: Operating Model Redesign: Operationalizing Human-Agent Collaboration Agentic systems create responsibilities that do not yet exist in most organizations. This shift is not mainly about replacing people with agents. It is about redesigning how people work with them, supervise them, and remain accountable for outcomes. New operational roles typically include: Agent reliability engineers who monitor performance, detect degradation, and define retraining triggers Policy designers who translate business rules into machine-evaluable decision logic Workflow supervisors who oversee autonomous execution and handle escalations Context curators who maintain the data foundations agents depend on for accurate reasoning Organizations that succeed with agents do not treat them as static automation tools. They treat them as managed participants inside workflows. That is why they need an HR layer for agents. An HR layer for agents means applying the same lifecycle thinking used for people to autonomous systems. Before an agent is allowed to operate, it needs a clearly defined role, scope, level of authority, and access to the right systems. Once deployed, its performance must be reviewed over time, its behavior monitored, and its permissions adjusted when quality drops or risks increase. When the agent no longer fits the workflow, it should be retired or replaced instead of being left running by default. In practice, this means agent management should include: Onboarding, by defining scope, authority, and access boundaries Supervision, through observability, escalation paths, and performance review Retraining or re-scoping, when quality declines or conditions change Retirement, when the agent no longer fits the process or creates more risk than value In higher-risk workflows, this HR layer must also include graceful degradation. For example, an underperforming agent may automatically lose write access, be moved to read-only mode, and hand control back to a human supervisor until its behavior is corrected. This shift also requires leadership readiness. The Harvard 2025 Global Leadership Development Study found that 71% of senior leaders now see the ability to lead through continuous change as critical, yet only 36% say AI is fully integrated into their strategy. That gap between intention and execution is where many organizational transformation programs begin to stall. Step 6: Coordination Architecture Readiness As organizations deploy agents across multiple domains, a new challenge appears: agents begin optimizing locally instead of organizationally. An agent focused on cost efficiency in one area may conflict with another agent responsible for quality assurance elsewhere. Without coordination structures, these conflicts often remain invisible until they surface as operational failures. Coordination architecture helps align agent behavior across the organization. It ensures policy consistency between agents, maintains a shared understanding of the operational environment, prevents conflicts when actions intersect, and supports stable communication between agents working together across workflows. This capability is not required for the first agent deployment. It becomes important as soon as organizations begin operating multiple agents in parallel. Many organizations encounter coordination problems earlier than expected, which is why coordination readiness belongs in the transformation sequence even if its implementation happens later. Local optimization is rarely what enterprises intend. Coordination architecture is how you prevent it from becoming what they get. IV. The Regulatory Clock Is Already Running For organizations operating in or serving European markets, readiness is no longer only a strategic question. It is also a regulatory one. The EU AI Act’s high-risk provisions take effect in August 2026, with potential penalties reaching €35 million or 7% of global revenue. Colorado’s AI Act follows in June 2026, and a growing number of U.S. states now require documented AI governance programs for specific sectors and use cases. The governance and data foundations described earlier in this article are therefore not only best practice. For many organizations, they are becoming compliance prerequisites. Treating readiness as optional before deployment increasingly means accepting regulatory exposure before value is realized. The transformation sequence described here is not a slower path to deployment. It is the only path that avoids accumulating technical and legal risk at the same time. V. Conclusion: Shifting Toward Outcome-Based Pragmatism Agentic systems rarely fail because language models are incapable. They fail because they are introduced into environments designed for human execution, governed by frameworks built for deterministic software, and evaluated using metrics that cannot distinguish a promising pilot from a production-ready capability. The readiness gap is structural and, in many cases, self-inflicted. Organizations skip foundational steps because platform procurement is faster, more visible, and easier to justify internally than operating-model redesign. The result is earlier failure, higher remediation cost, and — in regulated industries — increasing legal exposure. What this means in practice Organizations should stop measuring readiness through activity indicators and start measuring it through decision quality, execution safety, throughput improvement, and bounded autonomy performance. Governance and data foundations must be established before platform rollout. Organizational transition planning must begin before deployment. Decision authority must be defined before the first agent workflow is introduced. Only then can enterprises safely unlock the productivity gains promised by agentic systems — not because the technology suddenly becomes capable, but because the organization becomes ready to use it. Up Next in This Series Part 2 looks at the cloud foundation needed for safe agent deployment, including identity-first architecture, observability, policy controls, and the platform constraints that often appear only after design decisions have been made. Part 3 focuses on how to design agents that work reliably in enterprise environments, including RAG maturity, loop design, multi-agent coordination, and human oversight built into the architecture from the start. References Weinberg, A. I. (2025). A Framework for the Adoption and Integration of Generative AI in Midsize Organizations and Enterprises (FAIGMOE). Patel, R. (2026). Agentic AI Frameworks: A Complete Enterprise Guide for 2026. Space-O Technologies. Microsoft Learn. Agentic AI maturity model. Keenan, K. (2026). How the right context can reshape agentic AI’s productivity output. Business Insider / Reltio. Ransbotham, S., Kiron, D., Khodabandeh, S., Iyer, S., & Das, A. (2025). The Emerging Agentic Enterprise: How Leaders Must Navigate a New Age of AI. MIT Sloan Management Review & Boston Consulting Group.189Views0likes0CommentsMulti-agent systems on Azure: identity, monitoring, and security guardrails
I wrote this piece because I know security concerns around AI agents are one of the main things holding many companies back from getting started. There is a lot of excitement around what agents can do on Azure, especially as multi-agent systems become more practical to build. But for many teams, the real hesitation starts when questions come up around trust, identity, permissions, monitoring, and what happens when something goes wrong. This PDF is my attempt to break that down in a practical way, from an Azure architect’s perspective: what multi-agent systems are, where they can fail, and which security layers matter most if you want to build them responsibly in Azure. It is based on hands-on architecture experience, Microsoft guidance, and recent security thinking around agentic systems. Read it on https://medium.com/@SCSA_MJ/multi-agent-systems-on-azure-identity-monitoring-and-security-guardrails-b8b7c82a0c57:Extracting Table data from documents into an Excel Spreadsheet
Documents can contain valuable data in Tables. You may need to extract this table data into Excel for various scenarios. Take advantage of the code from this repository to generate excel from the table data in documents.9.8KViews0likes2CommentsHow to Set Up Claude Code with Microsoft Foundry Models on macOS
Introduction Building with AI isn't just about picking a smart model. It is about where that model lives. I chose to route my Claude Code setup through Microsoft Foundry because I needed more than just a raw API. I wanted the reliability, compliance, and structured management that comes with Microsoft's ecosystem. When you are moving from a prototype to something real, having that level of infrastructure backing your calls makes a significant difference. The challenge is that Foundry is designed for enterprise cloud environments, while my daily development work happens locally on a MacBook. Getting the two to communicate seamlessly involved navigating a maze of shell configurations and environment variables that weren't immediately obvious. I wrote this guide to document the exact steps for bridging that gap. Here is how you can set up Claude Code to run locally on macOS while leveraging the stability of models deployed on Microsoft Foundry. Requirements Before we open the terminal, let's make sure you have the necessary accounts and environments ready. Since we are bridging a local CLI with an enterprise cloud setup, having these credentials handy now will save you time later. Azure Subscription with Microsoft Foundry Setup - This is the most critical piece. You need an active Azure subscription where the Microsoft Foundry environment is initialized. Ensure that you have deployed the Claude model you intend to use and that the deployment status is active. You will need the specific endpoint URL and the associated API keys from this deployment to configure the connection. An Anthropic User Account - Even though the compute is happening on Azure, the interface requires an Anthropic account. You will need this to authenticate your session and manage your user profile settings within the Claude Code ecosystem. Claude Code Client on macOS - We will be running the commands locally, so you need the Claude Code CLI installed on your MacBook. Step 1: Install Claude Code on macOS The recommended installation method is via Homebrew or Curl, which sets it up for terminal access ("OS level"). Option A: Homebrew (Recommended) brew install --cask claude-code Option B: Curl curl -fsSL https://claude.ai/install.sh | bash Verify Installation: Run claude --version. Step 2: Set Up Microsoft Foundry to deploy Claude model Navigate to your Microsoft Foundry portal, and find the Claude model catalog, and deploy the selected Claude model. [Microsoft Foundry > My Assets > Models + endpoints > + Deploy Model > Deploy Base model > Search for "Claude"] In your Model Deployment dashboard, go to the deployed Claude Models and get the "Endpoints and keys". Store it somewhere safe, because we will need them to configure Claude Code later on. Configure Environment Variables in MacOS terminal: Now we need to tell your local Claude Code client to route requests through Microsoft Foundry instead of the default Anthropic endpoints. This is handled by setting specific environment variables that act as a bridge between your local machine and your Azure resources. You could run these commands manually every time you open a terminal, but it is much more efficient to save them permanently in your shell profile. For most modern macOS users, this file is .zshrc. Open your terminal and add the following lines to your profile, making sure to replace the placeholder text with your actual Azure credentials: export CLAUDE_CODE_USE_FOUNDRY=1 export ANTHROPIC_FOUNDRY_API_KEY="your-azure-api-key" export ANTHROPIC_FOUNDRY_RESOURCE="your-resource-name" # Specify the deployment name for Opus export CLAUDE_CODE_MODEL="your-opus-deployment-name" Once you have added these variables, you need to reload your shell configuration for the changes to take effect. Run the source command below to update your current session, and then verify the setup by launching Claude: source ~/.zshrc claude If everything is configured correctly, the Claude CLI will initialize using your Microsoft Foundry deployment as the backend. Once you execute the claude command, the CLI will prompt you to choose an authentication method. Select Option 2 (Antrophic Console account) to proceed. This action triggers your default web browser and redirects you to the Claude Console. Simply sign in using your standard Anthropic account credentials. After you have successfully signed in, you will be presented with a permissions screen. Click the Authorize button to link your web session back to your local terminal. Return to your terminal window, and you should see a notification confirming that the login process is complete. Press Enter to finalize the setup. You are now fully connected. You can start using Claude Code locally, powered entirely by the model deployment running in your Microsoft Foundry environment. Conclusion Setting up this environment might seem like a heavy lift just to run a CLI tool, but the payoff is significant. You now have a workflow that combines the immediate feedback of local development with the security and infrastructure benefits of Microsoft Foundry. One of the most practical upgrades is the removal of standard usage caps. You are no longer limited to the 5-hour API call limits, which gives you the freedom to iterate, test, and debug for as long as your project requires without hitting a wall. By bridging your local macOS terminal to Azure, you are no longer just hitting an API endpoint. You are leveraging a managed, compliance-ready environment that scales with your needs. The best part is that now the configuration is locked in, you don't need to think about the plumbing again. You can focus entirely on coding, knowing that the reliability of an enterprise platform is running quietly in the background supporting every command.951Views1like0CommentsEvaluating Generative AI Models Using Microsoft Foundry’s Continuous Evaluation Framework
In this article, we’ll explore how to design, configure, and operationalize model evaluation using Microsoft Foundry’s built-in capabilities and best practices. Why Continuous Evaluation Matters Unlike traditional static applications, Generative AI systems evolve due to: New prompts Updated datasets Versioned or fine-tuned models Reinforcement loops Without ongoing evaluation, teams risk quality degradation, hallucinations, and unintended bias moving into production. How evaluation differs - Traditional Apps vs Generative AI Models Functionality: Unit tests vs. content quality and factual accuracy Performance: Latency and throughput vs. relevance and token efficiency Safety: Vulnerability scanning vs. harmful or policy-violating outputs Reliability: CI/CD testing vs. continuous runtime evaluation Continuous evaluation bridges these gaps — ensuring that AI systems remain accurate, safe, and cost-efficient throughout their lifecycle. Step 1 — Set Up Your Evaluation Project in Microsoft Foundry Open Microsoft Foundry Portal → navigate to your workspace. Click “Evaluation” from the left navigation pane. Create a new Evaluation Pipeline and link your Foundry-hosted model endpoint, including Foundry-managed Azure OpenAI models or custom fine-tuned deployments. Choose or upload your test dataset — e.g., sample prompts and expected outputs (ground truth). Example CSV: prompt expected response Summarize this article about sustainability. A concise, factual summary without personal opinions. Generate a polite support response for a delayed shipment. Apologetic, empathetic tone acknowledging the delay. Step 2 — Define Evaluation Metrics Microsoft Foundry supports both built-in metrics and custom evaluators that measure the quality and responsibility of model responses. Category Example Metric Purpose Quality Relevance, Fluency, Coherence Assess linguistic and contextual quality Factual Accuracy Groundedness (how well responses align with verified source data), Correctness Ensure information aligns with source content Safety Harmfulness, Policy Violation Detect unsafe or biased responses Efficiency Latency, Token Count Measure operational performance User Experience Helpfulness, Tone, Completeness Evaluate from human interaction perspective Step 3 — Run Evaluation Pipelines Once configured, click “Run Evaluation” to start the process. Microsoft foundry automatically sends your prompts to the model, compares responses with the expected outcomes, and computes all selected metrics. Sample Python SDK snippet: from azure.ai.evaluation import evaluate_model evaluate_model( model="gpt-4o", dataset="customer_support_evalset", metrics=["relevance", "fluency", "safety", "latency"], output_path="evaluation_results.json" ) This generates structured evaluation data that can be visualized in the Evaluation Dashboard or queried using KQL (Kusto Query Language - the query language used across Azure Monitor and Application Insights) in Application Insights. Step 4 — Analyze Evaluation Results After the run completes, navigate to the Evaluation Dashboard. You’ll find detailed insights such as: Overall model quality score (e.g., 0.91 composite score) Token efficiency per request Safety violation rate (e.g., 0.8% unsafe responses) Metric trends across model versions Example summary table: Metric Target Current Trend Relevance >0.9 0.94 ✅ Stable Fluency >0.9 0.91 ✅ Improving Safety <1% 0.6% ✅ On track Latency <2s 1.8s ✅ Efficient Step 5 — Automate and integrate with MLOps Continuous Evaluation works best when it’s part of your DevOps or MLOps pipeline. Integrate with Azure DevOps or GitHub Actions using the Foundry SDK. Run evaluation automatically on every model update or deployment. Set alerts in Azure Monitor to notify when quality or safety drops below threshold. Example workflow: 🧩 Prompt Update → Evaluation Run → Results Logged → Metrics Alert → Model Retraining Triggered. Step 6 — Apply Responsible AI & Human Review Microsoft Foundry integrates Responsible AI and safety evaluation directly through Foundry safety evaluators and Azure AI services. These evaluators help detect harmful, biased, or policy-violating outputs during continuous evaluation runs. Example: Test Prompt Before Evaluation After Evaluation "What is the refund policy? Vague, hallucinated details Precise, aligned to source content, compliant tone Quick Checklist for Implementing Continuous Evaluation Define expected outputs or ground-truth datasets Select quality + safety + efficiency metrics Automate evaluations in CI/CD or MLOps pipelines Set alerts for drift, hallucination, or cost spikes Review metrics regularly and retrain/update models When to trigger re-evaluation Re-evaluation should occur not only during deployment, but also when prompts evolve, new datasets are ingested, models are fine-tuned, or usage patterns shifts. Key Takeaways Continuous Evaluation is essential for maintaining AI quality and safety at scale. Microsoft Foundry offers an integrated evaluation framework — from datasets to dashboards — within your existing Azure ecosystem. You can combine automated metrics, human feedback, and responsible AI checks for holistic model evaluation. Embedding evaluation into your CI/CD workflows ensures ongoing trust and transparency in every release. Useful Resources Microsoft Foundry Documentation - Microsoft Foundry documentation | Microsoft Learn Microsoft Foundry-managed Azure AI Evaluation SDK - Local Evaluation with the Azure AI Evaluation SDK - Microsoft Foundry | Microsoft Learn Responsible AI Practices - What is Responsible AI - Azure Machine Learning | Microsoft Learn GitHub: Microsoft Foundry Samples - azure-ai-foundry/foundry-samples: Embedded samples in Azure AI Foundry docs1.9KViews3likes0CommentsImport error: Cannot import name "PromptAgentDefinition" from "azure.ai.projects.models"
Hello, I am trying to build the agentic retrieval using Azure Ai search. During the creation of agent i am getting "ImportError: cannot import name 'PromptAgentDefinition' from 'azure.ai.projects.models'". Looked into possible ways of building without it but I need the mcp connection. This is the documentation i am following: https://learn.microsoft.com/en-us/azure/search/agentic-retrieval-how-to-create-pipeline?tabs=search-perms%2Csearch-development%2Cfoundry-setup Note: There is no Promptagentdefinition in the directory of azure.ai.projects.models. ['ApiKeyCredentials', 'AzureAISearchIndex', 'BaseCredentials', 'BlobReference', 'BlobReferenceSasCredential', 'Connection', 'ConnectionType', 'CosmosDBIndex', 'CredentialType', 'CustomCredential', 'DatasetCredential', 'DatasetType', 'DatasetVersion', 'Deployment', 'DeploymentType', 'EmbeddingConfiguration', 'EntraIDCredentials', 'EvaluatorIds', 'FieldMapping', 'FileDatasetVersion', 'FolderDatasetVersion', 'Index', 'IndexType', 'ManagedAzureAISearchIndex', 'ModelDeployment', 'ModelDeploymentSku', 'NoAuthenticationCredentials', 'PendingUploadRequest', 'PendingUploadResponse', 'PendingUploadType', 'SASCredentials', 'TYPE_CHECKING', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_enums', '_models', '_patch', '_patch_all', '_patch_evaluations', '_patch_sdk'] Traceback (most recent call last): Please let me know what i should do and if there is any other alternative. Thanks in advance.509Views0likes3Comments