Recent Discussions
New AI Foundry not sending refresh tokens to MCP (401 after access token expiration)
Hello, When connecting an MCP server hosted as an Azure Function using OAuth Passthrough in New AI Foundry Playground, the connection is established successfully, the Microsoft login popup appears, authentication succeeds, and the first MCP request returns data correctly. However, once the access token expires, the Playground and deployed AI Foundry agents to Copilot/Teams do not appear to refresh the token, despite offline_access being included in the requested scopes and a refresh URL being configured. All subsequent MCP calls fail with 401 Unauthorized until the connection is manually recreated. For testing, we reduced the token lifetime to 10 minutes to make the issue easier to reproduce. Impact This prevents long-lived or repeated MCP usage in AI Foundry Playground because the connection becomes unusable after token expiry and requires manual reconnection. MCP server host: Azure Function MCP server configuration in New AI Foundry: Endpoint: https://mcp-test-obo-rls-fabric.azurewebsites.net/mcp Client ID: <client_id> (redacted) Auth URL: https://login.microsoftonline.com/tenant_id(redacted)/oauth2/v2.0/authorize Token URL: https://login.microsoftonline.com/tenant_id(redacted)/oauth2/v2.0/token Refresh URL: https://login.microsoftonline.com/tenant_id(redacted)/oauth2/v2.0/token Scopes: openid profile offline_access api://(App ID)/user_impersonation Redirect URI: https://global.consent.azure-apim.net/redirect/(redacted)-fabric-rls-mcp Error returned tool_user_error: Authentication failed when connecting to the MCP server: https://mcp-test-obo-rls-fabric.azurewebsites.net:443/mcp : Response status code does not indicate success: 401 (Unauthorized). Response body: {"code":401,"message":"IDX10223: Lifetime validation failed. The token is expired. ValidTo (UTC): '03/30/2026 08:19:28', Current time (UTC): '03/30/2026 08:29:55'."}. Verify your authentication headers. Suggestions: First verify the required permissions. If the access token is expired or revoked, recreate the connection. If this connection is shared by other users or workflows, recreate it carefully to avoid disruption. Function App / Identity Provider (Entra) The Azure Function authentication configuration: Identity provider: Microsoft (MCP-Fabric-RLS-Server) App registration: MCP-Fabric-RLS-Server Supported account types: Single tenant Application (client) ID: App ID Client secret setting name: MICROSOFT_PROVIDER_AUTHENTICATION_SECRET Issuer URL: https://login.microsoftonline.com/tenant_id(redacted)/v2.0 Redirect URI is added to App Authentication Redirect URI configuration as "Web". Allowed token audiences api://App ID App ID https://mcp-test-obo-rls-fabric.azurewebsites.net Additional checks enabled Allow requests from any application Allow requests from any identity Allow requests only from issuer tenant tenant_id (redacted) Notes The first authenticated call succeeds, so the initial OAuth flow appears to be working. The failure only occurs after the access token expires. Because offline_access is requested and the refresh URL is configured, our expectation is that the client should refresh the token automatically. Our working hypothesis is that either: the refresh token is not being issued, the refresh token is not being stored/used by AI Foundry Playground, or OAuth Passthrough for MCP connections in this scenario does not currently support automatic refresh as expected. Thank you for any assistance provided.22Views0likes0CommentsMicrosoft Foundry Agent via Responses API rejects local image input as Base64 data URL / byte array
Hello everyone, we are seeing an issue with the new Microsoft Foundry Agents via the Responses API when sending a local image as part of the user message. What works text-only input image by public URL What fails local PNG passed as Base64 data URL local PNG passed as raw byte array through SDK methods Example failing image part: { "type": "input_image", "image_url": "data:image/png;base64,...", "detail": "auto" } Returned error: { "code": "invalid_payload", "message": "The provided data does not match the expected schema", "param": "/", "type": "invalid_request_error", "details": [] } We reproduced this in: C# Python raw REST So this does not appear to be limited to one SDK. Also important: the same pattern is used in the sample repo for the Foundry Agent Web App, and this scenario worked for us about one week ago: https://github.com/microsoft-foundry/foundry-agent-webapp Could you confirm whether local image input is currently supported for Foundry Agents through the Responses API, or whether this is a regression? Best regards48Views0likes0CommentsThe Business Foundation: Why Most Companies Aren’t Ready for Agentic AI
Before agents can execute decisions, organizations must redesign how they structure responsibility, data, governance, and operational context before autonomy can scale. The enterprise AI landscape has shifted. Organizations are moving beyond chatbots and isolated predictive models toward systems that can plan, decide, and execute multi-step work across finance, engineering operations, supply chains, and customer service. Many analysts now expect agentic AI to unlock major productivity gains across knowledge work. But despite the momentum, adoption remains limited. As of 2025, only about 2% of organizations have deployed agent-based systems at real operational scale, while most remain stuck in pilots. The reason is not model capability. It is readiness. The Core Problem Most organizations still treat AI adoption as a technical rollout exercise and measure progress through deployment indicators such as copilots enabled, pilots launched, or models evaluated. These metrics reflect experimentation activity, but they do not show whether an organization is ready to operate systems that make decisions and execute actions inside business workflows. Agentic systems do more than generate insights; they participate directly in operational processes. The gap between deploying AI tools and safely delegating decision-making authority to them is where many transformation efforts begin to stall. True enterprise readiness for agentic AI is not defined by how many models an organization deploys or how many pilots it launches. It depends on whether the organization can safely delegate bounded decisions to autonomous systems. In practice, this requires: Strategy and decision scoping: identifying where autonomous execution creates value and where human oversight must remain in place Process and decision-system maturity: redesigning workflows for human-agent collaboration with clear escalation boundaries Context-ready data foundations: ensuring agents operate on consistent, policy-aware operational context rather than fragmented data silos Governance and accountability structures: defining what agents may recommend, execute, escalate, or never touch, supported by auditability and oversight Team readiness and lifecycle management: preparing teams to supervise autonomous execution and managing agents as ongoing operational participants rather than static tools Coordination architecture readiness: aligning multiple agents across domains so local optimization does not create organizational conflict This article explains why traditional enterprise environments are not yet prepared for autonomous agents, what true agentic readiness actually looks like in practice, and the sequence of organizational changes required before decision-capable systems can be deployed safely at scale. I. The Readiness Illusion and the Root Causes of Failure Most organizations are deploying agentic systems into environments designed exclusively for human execution. That mismatch produces predictable friction across five structural layers. 1. Fragmented Operational Context (The Data Problem) Enterprises have a lot of data. What they often lack is usable context. Traditional systems record what happened. Agents also need to understand why something happened, how systems are connected, and where policy limits apply. In most organizations, customer systems, telemetry platforms, identity services, and finance tools do not stay aligned in real time. As a result, agents operate across disconnected information rather than a shared operational picture. This creates real risk. With generative AI, poor data quality usually produces a weak answer. With agentic AI, poor data quality can produce the wrong action at scale. More APIs, more pipelines, and more dashboards do not fix this by themselves. Without a shared semantic context across systems, agents can still make decisions that are internally logical but operationally wrong. For example, an agent may see that a customer received a large discount and conclude that future discounts should be limited, while missing that the original discount was approved because of a service outage and a retention risk. The data is available, but the business meaning behind it is not. 2. Undocumented Decision Systems Most organizations document workflows. However, very few document decision authority clearly enough for autonomous execution. Agents need to know where they are allowed to act, when they must escalate, and which decisions remain human-only. Without these boundaries, organizations often follow the same pattern: the first unexpected situation appears, confidence drops, and the agent is switched off. This is not a model problem. It is a decision-structure problem. Before deploying agents, organizations must be able to explain which decisions can be delegated and who remains responsible for each step. Many cannot yet do this. 3. The Governance Paradox Agentic systems do not fit traditional governance models. Most organizations still assume a simple structure: user → application → resource Agent-based systems introduce a new layer: user → agent → tools → resource This change affects access control, compliance processes, and audit visibility. Organizations usually buy agents like software tools but must manage them more like team members. That gap is rarely addressed before deployment begins. This issue is already visible today. Many enterprises are using vendor copilots and embedded AI features inside business systems without clear ownership, audit coverage, or governance rules. This creates a growing “shadow AI” layer even before intentional agent programs start. 4. Identity and Accountability Ambiguity Many organizations cannot clearly answer a simple question: who is responsible when an agent makes a mistake? In practice, agents often receive permissions that are broader than necessary, execution traces are difficult to follow across multiple systems, and accountability is split between IT, compliance, and business teams. Without clear attribution, autonomy introduces hidden risk instead of efficiency. Delegation without accountability is not automation. It is unmanaged risk. 5. Organizational Misalignment Most transformation programs still assume employees will use AI as a tool. Agentic environments change the role of employees from operators to supervisors. People are expected to review outcomes, guide behavior, and manage exceptions instead of executing every step themselves. Research from BCG shows that around 70% of AI project challenges come from people and process issues rather than technology. Organizations that invest in change management are significantly more likely to see successful results. Organizational readiness is not something to address later. It is required before agents can operate safely. Common Failure Patterns at a Glance Common failure patterns like these are already visible in real deployments. The Klarna case illustrates the challenge well. After replacing several hundred customer service roles with AI, the company later reported lower resolution quality for complex cases, declining satisfaction scores, and higher escalation rates, which led to renewed hiring in support roles. The outcome did not point to a failure of the model itself. It highlighted what happens when autonomous systems are introduced without the supporting process, governance, and team structures required for sustained operation. II. Defining True Agentic Readiness Agentic readiness is not just about having the right tools in place. It is about whether the organization has the capability to use autonomous systems safely and effectively. Definition Agentic readiness is the ability to safely delegate bounded operational decisions to autonomous systems while maintaining accountability, observability, and policy alignment across the full execution chain. Research consistently shows that organizations benefit from AI only when multiple maturity layers advance together. The MIT CISR AI Maturity Model, based on data from 721 companies, demonstrates that financial performance improves as organizations progress through the stages. Companies in early stages often perform below industry averages, while those reaching later stages perform significantly better. The key insight is that maturity is cumulative. Organizations cannot skip foundational steps and still expect reliable outcomes. For agentic systems, those cumulative layers include strategy alignment, decision-ready processes, context-ready data, governance structures, organizational roles, and technical architecture. When only some of these elements are in place, organizations produce pilots. When they advance together, organizations produce transformation. From Activity Metrics to Outcome Metrics One of the clearest signs of readiness is how an organization measures progress. Organizations at an early stage usually focus on activity: Number of models deployed Pilots launched Features enabled User onboarding numbers and API call volume More mature organizations focus on outcomes: Better decision quality and fewer errors Higher throughput for clearly defined tasks Consistent operation within safe autonomy boundaries Complete audit trails and accurate escalation handling This is not a semantic distinction. Organizations measuring activity invest indefinitely in pilots because they have no signal telling them a pilot has succeeded or failed. The measurement framework is itself a prerequisite for the transformation sequence. III. The Transformation Sequence Most Organizations Skip Many organizations begin agent adoption in the wrong order. Platforms are procured before governance is defined. Models are evaluated before workflows are structured. Autonomy is introduced before decision authority is mapped. The result is not faster progress. It is earlier failure, followed by expensive cleanup later. In traditional cloud transformation, architecture precedes automation. Agentic transformation follows the same rule: decision structure must exist before delegation can scale. Step 1: Strategic Alignment and Decision Scoping Organizations should begin by identifying where autonomy creates value safely — not where it is technically possible and not where ambitions are highest. Strong early candidates usually share the same characteristics: structured decisions, bounded scope, reversible actions, and high execution frequency. Typical examples include incident triage and routing, capacity classification, environment status updates, and prioritization support. These are good starting points not because they are simple, but because failures are visible, recoverable, and useful for learning. Delegation should grow gradually from bounded decision spaces toward broader authority. Organizations that struggle often start with highly visible, high-risk use cases because the business case looks attractive. Organizations that succeed usually begin with frequent, lower-impact decisions where feedback loops are short and improvements can happen quickly. Step 2: Process Maturity and Boundary Setting Agents do not fix broken workflows. They execute them faster. If a process depends on informal judgment, tribal knowledge, or undocumented exception handling, an agent will reproduce those weaknesses at machine speed. Before introducing autonomy, organizations should establish structured runbooks with clear execution paths, explicit escalation logic an agent can evaluate, defined exception-handling rules that do not rely on intuition, and clear boundaries between decisions an agent may take and those that must remain with humans. This level of discipline requires documentation precision that many organizations have never needed before. A statement such as “the engineer uses judgment” is not a runbook step. It is an undocumented dependency that will later appear as an agent failure. This is also where leaders face a practical choice: add agents on top of fragile legacy workflows, or redesign those workflows so delegation can happen safely. In many cases, the second path is slower at first but far more durable. Step 3: Data Context and Decision Awareness Agents cannot operate reliably in fragmented environments. The solution is not simply collecting more data. What they require is decision-aware context: structured knowledge about relationships between systems, service dependencies, environment classification, policy boundaries, and operational intent. This is a different challenge from building analytics platforms. Analytics depends on broad visibility across large datasets. Agentic execution depends on precise, current, and consistent information at the moment a decision is made. A customer record that is accurate enough for reporting may not be reliable enough for an agent executing a contract action. Because of this difference, data readiness becomes a leadership concern rather than only an infrastructure task. Microsoft’s digital transformation guidance captures this clearly with the principle “no AI without data”: organizations should identify critical data sources, establish governance ownership, improve quality, and define controlled access before introducing agents into operational workflows. Step 4: Governance and Delegation Redesign Organizations must explicitly define four categories of agent authority before deployment: What agents may recommend (advisory boundary) What agents may execute autonomously (execution boundary) What requires human approval before execution (escalation boundary) What remains permanently restricted regardless of confidence (prohibition boundary) These policies cannot remain static. Agentic systems require continuous supervision, not periodic review. Research supports this shift. Studies of governance professionals working with autonomous systems show that adopting traditional Enterprise Risk Management frameworks alone does not significantly reduce governance incidents. What makes the difference is integrating human oversight into execution loops and strengthening machine identity security. In practice, this means organizations need a delegated-autonomy governance function: a cross-functional group with representation from IT, compliance, legal, and business teams that continuously defines and monitors the boundaries of agent behavior. This is different from extending existing approval committees. Governance must move from acting as a gate before deployment to operating as a supervision layer throughout the lifecycle of the agent. This creates a basic operational tension: organizations adopt agents to reduce manual work, but safe autonomy requires stronger supervision, better observability, and tighter control over identity and permissions — especially in the early stages. Step 5: Operating Model Redesign: Operationalizing Human-Agent Collaboration Agentic systems create responsibilities that do not yet exist in most organizations. This shift is not mainly about replacing people with agents. It is about redesigning how people work with them, supervise them, and remain accountable for outcomes. New operational roles typically include: Agent reliability engineers who monitor performance, detect degradation, and define retraining triggers Policy designers who translate business rules into machine-evaluable decision logic Workflow supervisors who oversee autonomous execution and handle escalations Context curators who maintain the data foundations agents depend on for accurate reasoning Organizations that succeed with agents do not treat them as static automation tools. They treat them as managed participants inside workflows. That is why they need an HR layer for agents. An HR layer for agents means applying the same lifecycle thinking used for people to autonomous systems. Before an agent is allowed to operate, it needs a clearly defined role, scope, level of authority, and access to the right systems. Once deployed, its performance must be reviewed over time, its behavior monitored, and its permissions adjusted when quality drops or risks increase. When the agent no longer fits the workflow, it should be retired or replaced instead of being left running by default. In practice, this means agent management should include: Onboarding, by defining scope, authority, and access boundaries Supervision, through observability, escalation paths, and performance review Retraining or re-scoping, when quality declines or conditions change Retirement, when the agent no longer fits the process or creates more risk than value In higher-risk workflows, this HR layer must also include graceful degradation. For example, an underperforming agent may automatically lose write access, be moved to read-only mode, and hand control back to a human supervisor until its behavior is corrected. This shift also requires leadership readiness. The Harvard 2025 Global Leadership Development Study found that 71% of senior leaders now see the ability to lead through continuous change as critical, yet only 36% say AI is fully integrated into their strategy. That gap between intention and execution is where many organizational transformation programs begin to stall. Step 6: Coordination Architecture Readiness As organizations deploy agents across multiple domains, a new challenge appears: agents begin optimizing locally instead of organizationally. An agent focused on cost efficiency in one area may conflict with another agent responsible for quality assurance elsewhere. Without coordination structures, these conflicts often remain invisible until they surface as operational failures. Coordination architecture helps align agent behavior across the organization. It ensures policy consistency between agents, maintains a shared understanding of the operational environment, prevents conflicts when actions intersect, and supports stable communication between agents working together across workflows. This capability is not required for the first agent deployment. It becomes important as soon as organizations begin operating multiple agents in parallel. Many organizations encounter coordination problems earlier than expected, which is why coordination readiness belongs in the transformation sequence even if its implementation happens later. Local optimization is rarely what enterprises intend. Coordination architecture is how you prevent it from becoming what they get. IV. The Regulatory Clock Is Already Running For organizations operating in or serving European markets, readiness is no longer only a strategic question. It is also a regulatory one. The EU AI Act’s high-risk provisions take effect in August 2026, with potential penalties reaching €35 million or 7% of global revenue. Colorado’s AI Act follows in June 2026, and a growing number of U.S. states now require documented AI governance programs for specific sectors and use cases. The governance and data foundations described earlier in this article are therefore not only best practice. For many organizations, they are becoming compliance prerequisites. Treating readiness as optional before deployment increasingly means accepting regulatory exposure before value is realized. The transformation sequence described here is not a slower path to deployment. It is the only path that avoids accumulating technical and legal risk at the same time. V. Conclusion: Shifting Toward Outcome-Based Pragmatism Agentic systems rarely fail because language models are incapable. They fail because they are introduced into environments designed for human execution, governed by frameworks built for deterministic software, and evaluated using metrics that cannot distinguish a promising pilot from a production-ready capability. The readiness gap is structural and, in many cases, self-inflicted. Organizations skip foundational steps because platform procurement is faster, more visible, and easier to justify internally than operating-model redesign. The result is earlier failure, higher remediation cost, and — in regulated industries — increasing legal exposure. What this means in practice Organizations should stop measuring readiness through activity indicators and start measuring it through decision quality, execution safety, throughput improvement, and bounded autonomy performance. Governance and data foundations must be established before platform rollout. Organizational transition planning must begin before deployment. Decision authority must be defined before the first agent workflow is introduced. Only then can enterprises safely unlock the productivity gains promised by agentic systems — not because the technology suddenly becomes capable, but because the organization becomes ready to use it. Up Next in This Series Part 2 looks at the cloud foundation needed for safe agent deployment, including identity-first architecture, observability, policy controls, and the platform constraints that often appear only after design decisions have been made. Part 3 focuses on how to design agents that work reliably in enterprise environments, including RAG maturity, loop design, multi-agent coordination, and human oversight built into the architecture from the start. References Weinberg, A. I. (2025). A Framework for the Adoption and Integration of Generative AI in Midsize Organizations and Enterprises (FAIGMOE). Patel, R. (2026). Agentic AI Frameworks: A Complete Enterprise Guide for 2026. Space-O Technologies. Microsoft Learn. Agentic AI maturity model. Keenan, K. (2026). How the right context can reshape agentic AI’s productivity output. Business Insider / Reltio. Ransbotham, S., Kiron, D., Khodabandeh, S., Iyer, S., & Das, A. (2025). The Emerging Agentic Enterprise: How Leaders Must Navigate a New Age of AI. MIT Sloan Management Review & Boston Consulting Group.229Views0likes0Commentso3-mini not returning reasoning tokens
Hi, I work on a service that leverages o3-mini via Microsoft Foundry. In the past few days, I've observed that when calling o3-mini via Microsoft Foundry, that completion_token_details always has the reasoning_tokens value set to 0, regardless of the reasoning setting being used. In my testing, it seems that the reasoning is still occurring, as increasing reasoning value causes the completion_tokens field to increase by a good amount, but none of the reasoning levels cause the reasoning_tokens value to be anything other than 0. Has anyone else encountered this issue? Thanks! Tom31Views0likes0CommentsEnabling Secure Access to Private Resources with Azure AI Foundry
One of our client’s key requirements was to build an AI agent that could securely access private resources without exposing any data over the public internet. To meet this requirement, we followed an architecture similar to the diagram above, leveraging Azure AI Foundry with private networking. How We Designed the Solution As shown in the diagram, all core services - Azure Storage, AI Search, Foundry, and Cosmos DB - are placed behind private endpoints within the client’s virtual network. This ensures that none of these resources are publicly accessible. We deployed the agent inside a dedicated subnet within the same VNet. This allowed the agent to communicate directly with these services through private endpoints, without any need to traverse the public internet. The private endpoint subnet acts as a secure bridge between the agent and the underlying Azure services. At the same time, the client has full control over the network, including the option to apply firewall rules to manage outbound traffic. Why This Approach Worked All communication between the agent and data sources stays within the private network. Sensitive data, including queries and retrieved content, never leaves the network boundary. Access to resources is controlled through private endpoints and proper authorization. This design removes the risks associated with public endpoints and ensures compliance with enterprise security requirements. Final Outcome Using this approach, we delivered an AI solution that is both secure and scalable. The client now has an agent that can safely interact with private data sources while maintaining full control over network traffic and access policies. To learn more about the configuration, follow this documentation: https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/virtual-networks How are you currently securing your AI workloads when accessing sensitive data?118Views0likes0CommentsMulti-agent systems on Azure: identity, monitoring, and security guardrails
I wrote this piece because I know security concerns around AI agents are one of the main things holding many companies back from getting started. There is a lot of excitement around what agents can do on Azure, especially as multi-agent systems become more practical to build. But for many teams, the real hesitation starts when questions come up around trust, identity, permissions, monitoring, and what happens when something goes wrong. This PDF is my attempt to break that down in a practical way, from an Azure architect’s perspective: what multi-agent systems are, where they can fail, and which security layers matter most if you want to build them responsibly in Azure. It is based on hands-on architecture experience, Microsoft guidance, and recent security thinking around agentic systems. Read it on https://medium.com/@SCSA_MJ/multi-agent-systems-on-azure-identity-monitoring-and-security-guardrails-b8b7c82a0c57:87Views1like0CommentsTypo in Azure Foundry Learn
Hi Microsoft Foundry, I am not sure if this is the right place to post this, but I just wanted to report that there is a typo on this specific page : https://learn.microsoft.com/en-us/azure/foundry/openai/supported-languages?tabs=dotnet-secure%2Csecure%2Cpython-entra&pivots=programming-language-python Have a nice day.22Views0likes0Commentso3-deep-research is failed with the status incomplete with the reason as content filter
I working on an to do an deep research on internal data. I'm using currently the Azure OpenAI Responses API with MCP Tool. The underlying MCP server deployed into ACA with search and fetch tool with signatures in complaint with the specification (https://developers.openai.com/apps-sdk/build/mcp-server#company-knowledge-compatibility). OpenAI client created with 03-deep-research model with MCP tool, in a loop response status being checked. (https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/deep-research#remote-mcp-server-with-deep-research) Deep Research is being carried out for sometime, I could see in the log that handshake has been made, ListTools invoked, search tool is called post that fetch is called for the queries framed by the model.. But intermittently, the response status is becoming "incomplete" with incomplete reason as "content_filter". Otherwise the deep research is working fine. Not able identify the root cause as there is seems to be no way to identify what caused the content filtration whether its the prompt or completion. How to debug and check the root cause and rectify this ? Or is there known issue with the o3-deep-research model's intermediate reasoning completions Or search and fetch tool results are causing this ? I had uploaded a file made it available to MCP server, the search and fetch tool uses an Azure OpenAI agent to search the data using File Search and fetch tool gets the content of the file based on the id passed. For same file and same research topic the issue is not occurring always but intermittently.126Views0likes0CommentsFoundry Agent deployed to Copilot/Teams Can't Display Images Generated via Code Interpreter
Hello everyone, I’ve been developing an agent in the new Microsoft Foundry and enabled the Code Interpreter tool for it. In Agent Playground, I can successfully start a new chat and have the agent generate a chart/image using Code Interpreter. This works as expected in both the old and new Foundry experiences. However, after publishing the agent to Copilot/Teams for my organization, the same prompt that works in Agent Playground does not function properly. The agent appears to execute the code, but the image is not accessible in Teams. When reviewing the agent traces (via the Traces tab in Foundry), I can see that the agent generates a link to the image in the Code Interpreter sandbox environment, for example: `[Download the bar chart](sandbox:/mnt/data/bar_chart.png)` This works correctly within Foundry, but the sandbox path is not accessible from Teams, so the link fails there. Is there an officially supported way to surface Code Interpreter–generated files/images when the agent is deployed to Copilot/Teams, or is the recommended approach perhaps to implement a custom tool that uploads generated files to an external storage location (e.g., SharePoint, Blob Storage, or another file hosting service) and returns a publicly accessible link instead? I've been having trouble finding anything about this online. Any guidance would be greatly appreciated. Thank you!168Views0likes0CommentsNew Foundry Agent Issue
Hi all, I’m creating my first agent via New Foundry, so my questions are probably basic. As always, everything seemed straightforward… until deployment. I created an agent using gpt-4.1, added a list of instructions, and then used the Tools → Upload files functionality to attach a selection of reference documents. Everything worked perfectly in Preview mode. I then used the default option to Create a bot service, and it deployed successfully. To test it, I used the Individual Scope option (with the intention to share later with a couple of people — I haven’t worked that part out yet). Like magic, it appeared in my Teams and M365 Copilot, which was amazing… and then I ran my first search. It thought for a long time and then returned an error. In Co-pilot: and Teams: Nothing happens at all I’ve looked around for help but drawn a blank. I’m fairly sure it’s some kind of permissioning / access issue somewhere, but I can’t find where. Any help would be hugely appreciated.125Views0likes0CommentsIs there a way to connect 2 Ai foundry to the same cosmos containers?
I defined Azure AI Foundry Connection for Azure Cosmos DB and BYO Thread Storage in Azure AI Agent Service by using these instructions: Integration with Azure AI Agent Service - Azure Cosmos DB for NoSQL | Microsoft Learn I see that it created 3 containers under the cosmos I provided: <guid>-agent-entity-store v-system-thread-message-store <guid>-thread-message-store Now I created another AI foundry and added a connection for the same AI foundry, and it created 3 different containers under the same DB. Is there a way that they'll use the same exact containers? I want to use multiple AI foundries, and they will use the same Cosmos containers to manage the data.81Views0likes0CommentsSearching for a simple guide to index SharePoint and publish an agent in Foundry
Hey all, Does anyone have a good guide or best practices for this setup in Foundry? SharePoint as data source GPT model (document + image indexing, ideally vectorized/embeddings) Create an Agent an Share the Agent Restrict access to Agent to specific users/groups only Looking for tutorials, examples, or real-world setups. Thanks!87Views0likes0CommentsGet to know the core Foundry solutions
Foundry includes specialized services for vision, language, documents, and search, plus Microsoft Foundry for orchestration and governance. Here’s what each does and why it matters: Azure Vision With Azure Vision, you can detect common objects in images, generate captions, descriptions, and tags based on image contents, and read text in images. Example: Automate visual inspections or extract text from scanned documents. Azure Language Azure Language helps organizations understand and work with text at scale. It can identify key information, gauge sentiment, and create summaries from large volumes of content. It also supports building conversational experiences and question-answering tools, making it easier to deliver fast, accurate responses to customers and employees. Example: Understand customer feedback or translate text into multiple languages. Azure Document IntelligenceWith Azure Document Intelligence, you can use pre-built or custom models to extract fields from complex documents such as invoices, receipts, and forms. Example: Automate invoice processing or contract review. Azure SearchAzure Search helps you find the right information quickly by turning your content into a searchable index. It uses AI to understand and organize data, making it easier to retrieve relevant insights. This capability is often used to connect enterprise data with generative AI, ensuring responses are accurate and grounded in trusted information. Example: Help employees retrieve policies or product details without digging through files. Microsoft FoundryActs as the orchestration and governance layer for generative AI and AI agents. It provides tools for model selection, safety, observability, and lifecycle management. Example: Coordinate workflows that combine multiple AI capabilities with compliance and monitoring. Business leaders often ask: Which Foundry tool should I use? The answer depends on your workflow. For example: Are you trying to automate document-heavy processes like invoice handling or contract review? Do you need to improve customer engagement with multilingual support or sentiment analysis? Or are you looking to orchestrate generative AI across multiple processes for marketing or operations? Connecting these needs to the right Foundry solution ensures you invest in technology that delivers measurable results.Open-Source SDK for Evaluating AI Model Outputs (Sharing Resource)
Hi everyone, I wanted to share a helpful open-source resource for developers working with LLMs, AI agents, or prompt-based applications. One common challenge in AI development is evaluating model outputs in a consistent and structured way. Manual evaluation can be subjective and time-consuming. The project below provides a framework to help with that: AI-Evaluation SDK https://github.com/future-agi/ai-evaluation Key Features: - Ready-to-use evaluation metrics - Supports text, image, and audio evaluation - Pre-defined prompt templates - Quickstart examples available in Python and TypeScript - Can integrate with workflows using toolkits like LangChain Use Case: If you are comparing different models or experimenting with prompt variations, this SDK helps standardize the evaluation process and reduces manual scoring effort. If anyone has experience with other evaluation tools or best practices, I’d be interested to hear what approaches you use.76Views0likes0CommentsAzure AI foundry SDK-Tool Approval Not Triggered When Using ConnectedAgentTool() Between Agents
I am building an orchestration workflow in Azure AI Foundry using the Python SDK. Each agent uses tools exposed via an MCP server (deployed in Azure container app), and individual agents work perfectly when run independently — tool approval is triggered, and execution proceeds as expected. I have a main agent which orchestrates the flow of these individual agents.However, when I connect one agent to another using ConnectedAgentTool(), the tool approval flow never occurs, and orchestration halts. All I see is the run status as IN-PROGRESS for some time and then exits. The downstream (child) agent is never invoked. I have tried mcp_tool.set_approval_mode("never") , but that didn't help. Auto-Approval Implementation: I have implemented a polling loop that checks the run status and auto-approves any requires_action events. async def poll_run_until_complete(project_client: AIProjectClient, thread_id: str, run_id: str): """ Polls the run until completion. Auto-approves any tool calls encountered. """ while True: run = await project_client.agents.runs.get(thread_id=thread_id, run_id=run_id) status = getattr(run, "status", None) print(f"[poll] Run {run_id} status: {status}") # Completed states if status in ("succeeded", "failed", "cancelled", "completed"): print(f"[poll] Final run status: {status}") if status == "failed": print("Run last_error:", getattr(run, "last_error", None)) return run # Auto-approve any tool calls if status == "requires_action" and isinstance(getattr(run, "required_action", None), SubmitToolApprovalAction): submit_action = run.required_action.submit_tool_approval tool_calls = getattr(submit_action, "tool_calls", []) or [] if not tool_calls: print("[poll] requires_action but no tool_calls found. Waiting...") else: approvals = [] for tc in tool_calls: print(f"[poll] Auto-approving tool call: {tc.id} name={tc.name} args={tc.arguments}") approvals.append(ToolApproval(tool_call_id=tc.id, approve=True)) if approvals: await project_client.agents.runs.submit_tool_outputs( thread_id=thread_id, run_id=run_id, tool_approvals=approvals ) print("[poll] Submitted tool approvals.") else: # Debug: Inspect run steps if stuck run_steps = [s async for s in project_client.agents.run_steps.list(thread_id=thread_id, run_id=run_id)] if run_steps: for step in run_steps: sid = getattr(step, "id", None) sstatus = getattr(step, "status", None) print(f" step: id={sid} status={sstatus}") step_details = getattr(step, "step_details", None) if step_details: tool_calls = getattr(step_details, "tool_calls", None) if tool_calls: for call in tool_calls: print(f" tool_call id={getattr(call,'id',None)} name={getattr(call,'name',None)} args={getattr(call,'arguments',None)} output={getattr(call,'output',None)}") await asyncio.sleep(1) This code works and auto-approves tool calls for MCP tools. But while using ConnectedAgentTool(), the run never enters requires_action — so no approvals are requested, and the orchestration halts. Environment: azure-ai-agents==1.2.0b4 azure-ai-projects==1.1.0b4 Python: 3.11.13 Auth: DefaultAzureCredential Notes: MCP tools work and trigger approval normally when directly attached. and I ndividual agents function as expected in standalone runs. Can anyone help here..!75Views0likes0CommentsIssue when connecting from SPFX to Entra-enabled Azure AI Foundry resource
We have been successfully connecting our chat bot from an SPFX to a chat completion model in Azure, using key authentication. We have a requirement now to disable key authentication. This is what we've done so far: disabled API authentication in the resource Gave to the SharePoint Client Extensibility Web Application Principal "Cognitive Services OpenAI User", "Cognitive Service User" and "Cognitive Data Reader" permission in the resource In the SPFX we have added the following in the package-solution.json (and we have approved it in the SharePoint admin site): "webApiPermissionRequests": [ { "resource": "Azure Machine Learning Services", "scope": "user_impersonation" } ] To connect to the chat completion API we're using fetchEventSource from '@microsoft/fetch-event-source', so we're getting a Bearer token using AadTokenProviderFactory from "@microsoft/sp-http", e.g.: // preceeded by some code to get the tokenProvider from aadTokenProviderFactory const token = await tokenProvider.getToken('https://ai.azure.com'); const url = "https://my-ai-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview"; await fetchEventSource(url, { method: 'POST', headers: { Accept: 'text/event-stream', 'Content-type': 'application/json', Authorization: `Bearer ${token}` }, body: body, ...// truncated We added the users (let's say, email address removed for privacy reasons) in the resource as an Azure AI User. When we try to get this to work, we get the following error: The principal `email address removed for privacy reasons` lacks the required data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` to perform `POST /openai/deployments/{deployment-id}/chat/completions` operation. How can we make this work? Ideally we would prefer the SPFX principal to do the request to the chat completion API, without needed to have to add end users in the resource thorugh IAC, but my understanding is that AadTokenProviderFactory only issues delegated access tokens.48Views0likes0CommentsResponses API for gpt-4.1 in Europe
Hello everyone! I'm writing here trying to figure out something about the availability of the "responses" APIs in european regions: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/responses?tabs=python-key i'm trying to deploy a /responses endpoint for the model we are currently using, gpt-4.1, since i've read that the /completions endpoint will be dismissed by OpenAI starting from august 2026. Our application is currently migrating all the API calls from completions to responses, and we were wondering if we could already do the same for our clients in Europe as well, which have to comply to GDPR and currently use our Azure subscription. In the page linked above, i can see some regions that would fit our needs, in particular: francecentral norwayeast polandcentral swedencentral switzerlandnorth but then, i can also read "Not every model is available in the regions supported by the responses API.", which probably already answers my question: from the Azure AI Foundry Portal, i wasn't able to deploy such endpoint in those regions, except for the o3 model. For the 4.1 model, only the completions endpoint is listed, while searching for "Responses" (in "Deploy base model") returns only o3 and these others: Can you confirm that i'm not doing anything wrong (looking in the wrong place to deploy such endpoint), and currently the gpt-4.1 responses API is not available in any European region? Do you think it's realistic it will be soon (like in 2025)? Any european region would work for us, in the "DataZone-Standard" type of distribution, which already ensures GDPR compliance (no need for a "Standard" one in one specific region). Thank you for your attention, have a nice day or evening,122Views0likes0CommentsChaining and Streaming with Responses API in Azure
Responses API is an enhancement of the existing Chat Completions API. It is stateful and supports agentic capabilities. As a superset of the Chat Completions class, it continues to support functionality of chat completions. In addition, reasoning models, like GPT-5 result in better model intelligence when compared to Chat Completions. It has input flexibility, supporting a range of input types. It is currently available in the following regions on Azure and can be used with all the models available in the region. The API supports response streaming, chaining and also function calling. In the examples below, we use the gpt-5-nano model for a simple response, a chained response and streaming responses. To get started update the installed openai library. pip install --upgrade openai Simple Message 1) Build the client with the following code from openai import OpenAI client = OpenAI( base_url=endpoint, api_key=api_key, ) 2) The response received is an id which can then be used to retrieve the message. # Non-streaming request resp_id = client.responses.create( model=deployment, input=messages, ) 3) Message is retrieved using the response id from previous step response = client.responses.retrieve(resp_id.id) Chaining For a chained message, an extra step is sharing the context. This is done by sending the response id in the subsequent requests. resp_id = client.responses.create( model=deployment, previous_response_id=resp_id.id, input=[{"role": "user", "content": "Explain this at a level that could be understood by a college freshman"}] ) Streaming A different function call is used for streaming queries. client.responses.stream( model=deployment, input=messages, # structured messages ) In addition, the streaming query response has to be handled appropriately till end of event stream for event in s: # Accumulate only text deltas for clean output if event.type == "response.output_text.delta": delta = event.delta or "" text_out.append(delta) # Echo streaming output to console as it arrives print(delta, end="", flush=True) The code is available in the following github link - https://github.com/arunacarunac/ResponsesAPI Additional details are available in the following links - Azure OpenAI Responses API - Azure OpenAI | Microsoft Learn191Views0likes0CommentsDo you have experience fine tuning GPS OSS models?
Hi I found this space called Affine. It is a daily reinforcement learning competition and I'm participating in it. One thing that I am looking for collaboration on is with fine tuning GPT OSS models to score well on the evaluations. I am wondering if anyone here is interested in mining? I feel that people here would have some good reinforcement learning tricks. These models are evaluated on a set of RL-environments with validators looking for the model which dominates the Pareto frontier. I'm specifically looking to see any improvements in the coding deduction environment and the new ELR environment they made. I would like to use a GPT OSS model here but its hard to fine-tune these models in GRPO. Here is the information I found on Affine: https://www.reddit.com/r/reinforcementlearning/comments/1mnq6i0/comment/n86sjrk/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button70Views0likes0CommentsImage Dataset in Azure AI Asking for Tabular Format During Training
Hi everyone, I’m working on an image-based project in Azure AI. My images (PNG) are stored in Azure Blob Storage, and I registered them as a folder in Data Assets. When I start training, the UI asks for a tabular dataset instead. Since my data is images, I’m unsure how to proceed or whether I need to convert or register the dataset differently. What’s the correct way to set up image data for training in Azure AI?63Views0likes0Comments
Events
Recent Blogs
- The architectural problem In any non-trivial GenAI platform, you end up managing a fleet of models. Cheap models for classification and light chat. Reasoning models for multi-step tasks. Frontier m...Apr 28, 2026151Views0likes0Comments
- The problem I wanted to explore Most agentic AI systems today follow a familiar pattern: you design the workflow, you decide which agents exist, you wire the communication between them. That works ...Apr 28, 2026124Views0likes0Comments