ai agents
138 TopicsSecuring data and access in the era of AI with Microsoft Entra and Microsoft Purview
As organizations move from experimenting with AI to deploying it at scale, securing sensitive data, access, and AI usage has become mission critical. In this series, Microsoft experts will show how Microsoft Entra and Microsoft Purview help you: Protect sensitive data across networks, apps, and AI interactions Govern access for users, applications, and AI agents Reduce risk while enabling innovation at scale Whether you're shaping your security strategy or implementing controls, you’ll walk away with the guidance you need to secure data and access to AI as one unified strategy. DATE TIME (PDT) TOPIC July 21 9:00 AM Secure the age of AI: Redefining trust, data and access July 22 9:00 AM Data and identity controls for the browser and network July 23 9:00 AM Unlock AI agents without sacrificing security How do I participate? Select the sessions you are interested in, then select Add to Calendar to save the date and/or the Attend button to save your spot, receive event reminders, and participate in the Q&A. Not able to attend live? This session will be recorded and available on demand shortly after airing. Don't see Attend or Add to Calendar? Sign in to the Tech Community to join the conversation.52Views0likes0CommentsUnlock AI agents without sacrificing security
AI agents are reaching into mailboxes, files, line-of-business apps, and the open web on behalf of your users—and the business wants more of them, faster. To scale agents safely, your security teams need to be able to verify each agent, govern what it can access, and enforce clear boundaries across every interaction. Learn how Microsoft Entra helps you discover shadow AI agents, govern agent permissions, keep BYOD and endpoint-based agents in scope, and apply Conditional Access to AI prompts and responses. Then see how Microsoft Purview provides visibility into agent activity, strengthens runtime data protection, helps detect agentic risk, and supports auditability across local agents developed on GitHub Copilot CLI, Claude Code, OpenAI Codex, and OpenClaw. Walk away with practical ways to unlock AI agents while keeping access and data protection aligned with your enterprise security needs. How do I participate? Select Add to Calendar to save the date, then click the Attend button to save your spot, receive event reminders, and participate in the Q&A. Not able to attend live? This session will be recorded and available on demand shortly after airing. Don't see Attend or Add to Calendar? Sign in to the Tech Community to join the conversation. This session is part of Securing data and access in the era of AI with Microsoft Entra and Microsoft Purview. View the full agenda for more insights to help you move from experimenting with AI to deploying it at scale, securing sensitive data, access, and AI usage.31Views0likes0CommentsAuto-Generated Rubric Evaluators: Building Context-Aware Evaluators for AI Agents
Authors: Shuo Qiu, Sydney Lister, Ilya Matiach, Ali Mahmoudzadeh, Salma Elshafey, José Santos, Vivek Bhadauria, Morteza Ziyadi, April Kwong Why Your Agent Needs a Task-Specific Evaluator Picture a customer-service agent for a telecom company. A customer messages in asking to switch plans and get a refund for last month's overcharge. The agent needs to verify the customer's identity and confirm the new plan before ending the conversation. Miss the verification step and you have a security incident. Those success criteria are specific to this one scenario. The auto-generated rubric evaluator is designed to help address this: use the context you already have to generate a task-specific rubric evaluator that returns a weighted score with per-dimension explanations, then can be reused across iterations. How We Validated Evaluator Quality We validate auto-generated rubric evaluators across four aspects: Verdict Validity — whether judgments on real cases reflect what a competent reviewer would conclude. Rubric Validity — whether generated rubrics capture the task requirements and failure modes. Manual Quality Inspection — whether judgments on real cases look right to a human reviewer. Reliability and Separability — whether judgments are stable across repeated runs and distinguish stronger from weaker candidate agents. Validation Results 1. Agreement with Trusted Reference Signals We first validate the auto-generated rubric evaluator end-to-end: we use the rubric generator to produce the rubric's dimensions, then the rubric evaluator scores each case against them. We use GPT-5.4 for both rubric generator and rubric evaluator. The first question is whether those end-to-end scores move with signals teams already trust. For example, does the rubric evaluator give lower scores to failed cases, and higher scores to successful ones? We start by choosing benchmarks the community already uses as reference points: Dataset What It Tests JSON Editing Deterministic structured-editing tasks where outputs can be checked exactly. TauBench Telecom Customer-service agent tasks requiring policy following, tool use, and task completion. The Agent Company Long-horizon workplace-agent tasks with multi-step tool use. We InspectAI’s 10-case subset. BFCL Multi-Turn Tool Calling Multi-turn function-calling behavior across realistic tool-use scenarios. LiveClawBench Open-ended web-agent tasks that require browsing, interaction, and judgment. Retail-Agent Customer Service Real production-style retail support conversations. We then ask the generation pipeline to generate rubric evaluators for each scenario, and measure the correlation between the evaluator's scores and the trusted reference signals. For the three datasets with per-case reference signals, we can directly check whether the evaluator gives higher scores to successful cases than failed ones. We then create traces from different candidate agents. In these experiments, each candidate agent uses the same task setup and prompt but a different underlying model, which gives us a controlled range of stronger and weaker agent behaviors. Because the evaluator returns a continuous score, we use receiver operating characteristic area under the curve (ROC AUC) when the trusted case-level signal can be read as success versus failure. It measures how often, when comparing a successful case with a failed case, the evaluator assigns the successful case the higher score. In these experiments, generated rubric evaluators align well with trusted signals at the case level, with ROC AUC of 0.794 on TauBench Telecom, 0.869 on The Agent Company, and 0.972 on JSON Editing. An important goal of evaluation is to score candidate agents that perform better on the reference signal also higher by the evaluator. This is more directly relevant when choosing among candidate agents, and it is a more forgiving test of alignment because aggregated scores are less sensitive to noise in individual judgments. We measure this with aggregate candidate-agent Spearman ρ, which checks whether the evaluator ranks candidate agents the same way as the oracle — a ρ of 1.0 means the evaluator's ranking is perfectly aligned with the oracle's, while 0 means no relationship. For BFCL and LiveClawBench, the oracle ranking comes from their official leaderboard scores. At the aggregate candidate-agent level, Spearman ρ ranges from 0.69 on The Agent Company to 0.98 on JSON Editing across all five benchmarks. Aggregation reduces per-case noise, so the candidate-agent ranking is the more relevant view when the goal is agent selection. 2. Rubric Quality on GDPVal GDPVal is a benchmark that measures how well AI models perform real-world, economically valuable work in sectors such as government, manufacturing, and technical services. This benchmark includes a rubric for each task, authored by a domain expert, which is useful for rubric-validity measurement. We ask the rubric generator to produce a rubric for each test case, then use a separate matching judge to match the generated dimensions to the expert dimensions. This gives us two metrics for rubric quality: Recall. For each annotated dimension, did at least one generated dimension express a similar requirement? Precision. For each generated dimension, did at least one annotated dimension express a similar requirement? Under this setup, the generated rubric achieved 72.1% recall and 86.4% precision against the expert dimensions on GDPVal tasks. 3. Manual Quality on Retail-Agent Conversations For a real-world retail-agent customer-service dataset, we generated a rubric with six dimensions, then graded 12 conversations over those dimensions, and manually inspected every case-by-dimension judgment. In this small sample (12 conversations), the reviewer disagreed with only one of the 72 case-by-dimension judgments. Most neutral cases involved applicability questions that the evaluator flagged inconsistently. Reliability and Separability Another key question is how reliable the evaluator's scores are. We look at two things: reliability (does the same case get the same score next time?) and separability (can the evaluator confidently rank two candidate agents against each other?). Reliability If you re-grade the same case tomorrow, do you get the same score? We measure this two ways: single-measure intraclass correlation, ICC(3,1) measures how much of the score variance comes from real case differences rather than repeat noise, and Kendall's W measures rank reliability across repeats — 1.0 means the evaluator ranks cases in the same order every time. On JSON Editing, single-measure intraclass correlation, ICC(3,1), is 0.852 and Kendall's W is 0.767, which means re-running the evaluator on the same case gives similar numbers under repeated runs in this experimental setup. TauBench Telecom shows similarly strong reliability, with ICC(3,1) of 0.85 and Kendall's W of 0.89 under the same recommended configuration. Separability Separability measures whether the score is decisive: when you put two candidate agents side by side, can the evaluator confidently say which one is better? We report mean pairwise bootstrap confidence, which measures ranking stability. For each pair of candidate agents, we resample cases and recompute each agent's mean evaluator score. The pair confidence is the fraction of bootstrap samples supporting the more common ordering: a value near 0.5 means the ordering is unstable, while a value near 1.0 means the evaluator consistently separates that pair. We average this across all candidate-agent pairs. The candidate-agent intervals are tight on JSON Editing and TauBench Telecom. Mean pairwise bootstrap confidence is 0.96 on JSON Editing dataset and 0.95 on TauBench Telecom dataset. Get Started The auto-generated rubric evaluator's results may vary depending on task design, input quality, and evaluation setup. Start with a clear, well-defined description for your evaluation in the prompt field, include as much high-quality context as possible, such as the agent definition and examples, and review the generated rubric carefully before using it. Run it against a small set of known-good and known-bad cases to understand how the score reflects different failure modes. Try the workflow in the Foundry portal and follow the rubric evaluator tutorial. For a demo that covers Rubric in the broader observability workflow, watch the Build breakout session From observability to ROI for AI agents on any framework. For the full set of Build observability announcements, read Build 2026: From observability to ROI for AI agents on any framework.294Views0likes0CommentsThe Employee Self-Service agent - how to find it
I see that Microsoft has put a lot of efforts to marketing the Copilot Employee Self-Service Agent but it seems it is not available for every tenant. I have already checked on several small and mid ones (5-2k users) and cannot find it in templates. As I understand from what I already red and saw, this is a template that should be available to you when you start building agents. Unfortunately when I enter Copilot Studio and enter in the search ESS (abbreviation from Employee Self-Service) I got only those two agents marked red filtered (see screenshot below). When I installed IT Helpdesk agent, I do not see topics related to HRSD in ServiceNow which I need. I found the Employee-Self-Service-Agent-Developer-Kit that contains same examples of the ServiceNow HRSD topics, but when I copy the YAML code of those topics to my agent I got some references to topics that I do not exists in my agents. Anyone has struggled with the same? Or maybe you have access to the Employee Self-Service agent and can share the basic solution/topics with me? Michal34Views0likes0CommentsIntroducing Durable Functions in PostgreSQL
By Abe Omorogbe, Senior PM | Pino De Candia, Principal Software Engineer | TJ Green, Principal Software Engineer Postgres will happily store your data, run your queries, and scale with you for years. But the moment you need to do more with that data, such as running multi-step transformation, scheduling nightly rollups, generating embeddings or waiting on an approval, you hit a wall. Postgres has no built-in way to run long-lived, fault-tolerant work. That's why we built pg_durable, a new open-source PostgreSQL extension that brings durable execution directly into the database. With pg_durable, Postgres doesn’t just store your data, it runs long-lived, fault-tolerant workflows on it, with built-in retries, parallelism, scheduling, and recovery. Instead of stitching together PL/pgSQL functions or building external orchestration systems, you can now define and run resilient workflows entirely in your database, backed by Postgres' durability and high availability. And on Azure HorizonDB, pg_durable also powers AI pipelines, enabling production-ready data and AI workflows, end-to-end, right inside the database. In this post, we'll cover: The hidden trap: blocking background work What pg_durable is and the DSL that drives it How this engine powers AI pipelines on HorizonDB Sample patterns worth exploring Getting started on HorizonDB, on your laptop, and in VS Code 🚀 Want to try it out? pg_durable ships in Azure HorizonDB, Microsoft's new PostgreSQL cloud service. The HorizonDB Preview is the fastest way to try pg_durable and AI pipelines together. Get started in HorizonDB → pg_durable visualization The hidden trap: blocking background work Most Postgres teams eventually reach a point where they need to run critical tasks on their data: transformations, nightly aggregations, database maintenance workflows, embedding jobs, or multi-step business processes. So, they do the natural thing and try to keep that work inside Postgres. They end up on a journey of increasing complexity and maintenance burden. First, just run the task as a function in your database You cram the whole workflow into one PL/pgSQL function: loop, transform, call APIs, write results, return. It looks simple until you have to run it in production. One connection stays tied up the whole time. Everything runs inside one big transaction, with long locks and no visibility into partial progress. If the connection drops or the database restarts, the whole run is gone. No per-step retries. No parallelism. No scheduling. No clean way to pause for human input. When it fails, you move it outside You push the workflow into an external service: a job queue, polling workers, state tables, step coordination, retry logic, crash-recovery sweeps, and cleanup jobs. What started as a few background tasks turns into a full distributed system. Before you’ve even touched the business logic, you’re building and operating infrastructure just to coordinate work that’s still fundamentally tied to your data. Both paths are workarounds for the same missing primitive: durable, asynchronous background work that lives where your data lives. That's the gap pg_durable fills. What pg_durable actually is pg_durable is a Postgres extension that consists of a DSL (Domain specific language) and the duroxide runtime hosted in a Postgres background worker. You describe a workflow as a small SQL expression, call df.start(...), and get an instance ID back immediately. The work runs off to the side in a background worker, so it never blocks your connection or transaction, and you can check progress later with df.status() and df.result(). The execution state lives in Postgres, which means it benefits from the database’s durability, HA, backups, and recovery. Additionally, the workflow definition does not have to live in the database: your application can send it to df.start(...) over a regular Postgres connection. 2: pg_durable orchestration of worker and schema Because execution is asynchronous, pg_durable automatically breaks a workflow into discrete steps. Each step runs in its own session and transaction, commits its progress, and hands off to the next instead of keeping one giant transaction open. Steps are checkpointed in Postgres and recovered by deterministic replay, so workflows survive crashes, restarts, and failovers and resume where they left off. If a step fails, only that step retries. The whole thing is expressed through a tiny DSL of composable operators: Operator Meaning ~> Sequential. run this, then that & Parallel. fan out, wait for all | Race. fan out, take the first to finish ?> / !> Conditional. if / else @> Loop. repeat durably, survive restarts |=> Capture a step's result into a variable (reuse with $) Advanced Functions df.if() Conditional branch df.loop() Repeat statements df.join() Execute in parallel, wait for all df.http() To call an allowlisted endpoint df.wait_for_schedule() For cron-style timing df.wait_for_signal() Pause for an external event Read more about all operators and functions in pg_durable Without pg_durable vs. with pg_durable The hand-rolled version of "run three aggregations in parallel, then refresh a dashboard with retries and crash recovery" usually means 300+ lines of queue tables, polling workers, state-machine rows, per-step retry logic, crash-recovery sweeps, and cleanup jobs. Plus, the runbook to operate it. The pg_durable version: SELECT df.start( 'SELECT count(*) FROM users' & 'SELECT count(*) FROM orders' & 'SELECT sum(amount) FROM orders' ~> 'REFRESH MATERIALIZED VIEW metrics', 'refresh-dashboard' ); You write the SQL. pg_durable owns the queue, the state, the coordination, the retries, and the crash recovery. Two ways to use pg_durable 1: Use pg_durable directly (works on Azure HorizonDB or any Postgres 17) Enable it and start orchestrating: CREATE EXTENSION pg_durable; SELECT df.start($$ SELECT 'Hello, durable world!' AS message $$); -- returns an instance ID immediately; the worker runs it asynchronously From there you compose: sequential pipelines, conditional branches, races for timeout-or-result, variable passing between steps, human-in-the-loop approvals, scheduled maintenance all in SQL, close to the data, with no new infrastructure. This is the "just use Postgres" answer to a problem teams usually solve by leaving Postgres. Because it's open source under the permissive PostgreSQL License, you can clone the repo and run it on your laptop, your server, or any cloud. 2: AI pipelines (HorizonDB capability) On HorizonDB, pg_durable becomes the foundation for something even more approachable: a managed, declarative AI pipeline surface in the azure_ai extension. pg_durable gives you the durable execution engine, while the ai.* API gives you an AI-shaped model of sources, steps, sinks, and triggers that compile into a durable graph. Traditional app-tier embedding pipelines fail in predictable ways: a transient API error mid-batch with no shared checkpoint, a worker that crashes after writing chunks but before marking the parent row processed, no clean way to re-embed just the rows that changed. Move that logic into HorizonDB and the source, the steps, the sink, and the run history are all SQL, protected by the same transactions, backups, and PITR (point-in-time recovery) your data already has. A complete chunk → embed AI pipeline is one definition: SELECT ai.create_pipeline( name => 'ai_pipeline', source => ai.table_source(table_name => 'documents_ai_pipeline'), steps => ARRAY[ ai.chunk(input => 'content'), ai.embed(model => 'default-embedding', input => 'chunk_text', dimensions => 1536) ], trigger => 'on_change', sink => ai.table_sink('documents_ai_pipeline_output') ); SELECT ai.run('ai_pipeline'); Each AI step becomes a durable node, so if ai.embed() fails, ai.chunk() doesn’t run again. And with trigger => 'on_change', the pipeline runs automatically as rows change, embedding only what’s new. Add a DiskANN index on the resulting table, and you have production-ready vector search end to end, entirely inside the database. Where pg_durable fits and where it doesn't If you've used external orchestrators such as Temporal or Airflow, your first reaction is probably: why would I put control flow in my database? Fair question. pg_durable isn't trying to be a universal orchestrator. Reach for pg_durable when the workflow is tightly coupled to Postgres state. The rows it reads and writes live in the same database, it benefits from the database's own durability, backups, and PITR, and you'd rather not stand up a separate system to coordinate work that never leaves the data tier. Think: embedding pipelines, ETL jobs, scheduled maintenance, and queue-style background jobs. Reach for a dedicated orchestrator when the workflow's center of gravity is outside Postgres, fanning across heterogeneous services, or running arbitrary application logic that does not map cleanly to SQL steps, branching, loops, or HTTP calls. Get started On Azure HorizonDB CREATE EXTENSION IF NOT EXISTS pg_durable; -- Execute a simple SQL query as a durable function SELECT df.start($$ SELECT 'Hello, durable world!' AS message $$); -- Returns: a1b2c3d4 (8-character instance ID) -- Get result of a specific instance SELECT df.result(<ID>); That's it: submit, walk away, inspect. Read the documentation for more details. In VS Code, with the PostgreSQL extension A dense one-liner of ~>, &, and |=> is precise once it clicks, but the learning curve is real so flatten it with tooling. Install the PostgreSQL extension for VS Code from the Marketplace: Connect to HorizonDB or your local Postgres directly from the extension Let Copilot write the SQL. The pg-durable-sql skill turns a plain-English description ("every night, archive orders older than 90 days") into correct pg_durable syntax. Run it and watch it. The extension renders pg_durable workflows and azure_ai pipelines as live graphs, definition and each run, so you can see every step, its timing, and exactly where a failure happened. Authoring, execution, run visualization, and inspection in one window and the same tooling works against any Postgres, not just HorizonDB. On your laptop Prefer to run it yourself? Clone microsoft/pg_durable, use the Codespace prebuild or VS Code Dev Container, and add the extension on any Postgres 17. Sample patterns worth exploring The scenario guide has a full catalog of scenarios; however, these are the three I would start with. ETL Pipeline: a multi-step data transformation where each step must be completed before the next begins. Failures should stop the pipeline. SELECT df.start( 'DELETE FROM target WHERE loaded_at < now() - interval ''7 days''' -- Step 1: Cleanup old ~> 'UPDATE staging SET processed_at = now() WHERE processed_at IS NULL' -- Step 2: Mark staging ~> 'INSERT INTO target (data, source_id) SELECT data, source_id FROM staging WHERE processed_at IS NOT NULL', -- Step 3: Load 'etl-pipeline' -- Label for easy identification ); If the database restarts mid-backfill, it picks up from the last checkpointed batch, not row zero. See full example Scheduled Data Sync: poll an external API or run a job on a schedule (hourly, daily, every 30 minutes). The job should run forever and survive restarts. (See full example): -- Scheduled sync: fetch data every 30 minutes (runs forever) SELECT df.start( @> ( -- @> creates an eternal loop -- Fetch from external API (df.http( 'https://httpbingo.org/json', 'GET' ) |=> 'response') -- Store the response ~> 'INSERT INTO external_data_sync (data) VALUES ($response::jsonb)' -- Wait for next scheduled run ~> df.wait_for_schedule('*/30 * * * *') -- Cron: every 30 minutes ), 'scheduled-data-sync' ); Human-in-the-loop approval: auto-apply routine changes, pause the risky ones until a person signals approval (See full example): SELECT df.start( 'SELECT amount > 10000 AS needs_review FROM invoices WHERE id = 42' |=> 'risky' ?> ( df.wait_for_signal('invoice-42') ~> 'UPDATE invoices SET status = ''paid'' WHERE id = 42' ) !> 'UPDATE invoices SET status = ''paid'' WHERE id = 42', 'invoice-approval' ); The workflow simply waits minutes or days until a reviewer releases it with the matching signal, then resumes. The community is already running with it pg_durable launched as open source and the community is already kicking the tires. The project was a top article on Hacker News on launch day and 1.7K stars on GitHub within its first few days of initial launch. Also Franck Pachot (PostgreSQL community veteran) published an independent walkthrough, Getting Started with pg_durable: durable workflows inside PostgreSQL within days of release. The repo is actively developed, and the maintainers are reading every issue and PR. If you want improvements in our DSL ergonomics, say so. If you want an operator that doesn't exist yet, open an issue. If you've got a scenario we haven't covered, send a PR. The syntax, the docs, and the rough edges all get better when people who run Postgres in production push back. So, clone it, and build something real. If you find rough edges, open an issue or send a PR at microsoft/pg_durable. We think you'll be surprised by how much it can take. Learn more pg_durable on GitHub Durable Functions on HorizonDB AI pipelines on HorizonDB520Views2likes0CommentsCopilot Podcast Creations Stuck in “Creating” State
My Copilot podcast creations are stuck in the “Creating” state and will not delete. The delete button is greyed out. I have already tried closing all browsers, restarting my iPad, phone, and PC, and the issue persists across all devices. It looks like the jobs are stuck in the backend queue and cannot be cleared from my side. I also attempted to open a support chat, but the service request auto‑closed before an agent joined. Is there a way to clear these stuck podcast creations on the backend, or can someone from Microsoft confirm if this is a known issue?19Views0likes0CommentsFile Uploads Not Passed to Custom Engine Agent in Microsoft 365 Copilot Chat
Hi all, I'm working with a custom agent built in Copilot Studio (full authoring experience — topics, knowledge sources, agent flows) published to both the Microsoft Teams channel and the Microsoft 365 channel. I've noticed a significant UX discrepancy when it comes to file and image attachments, and I want to confirm my understanding and check whether any workaround or roadmap item exists. What works: ✅ File/image uploads work as expected in the Copilot Studio test pane ✅ File/image uploads work in Teams chat when interacting with the agent What doesn't work: ❌ File/image uploads do not reach the agent when interacting via the Microsoft 365 Copilot app (both the desktop app and the web experience at microsoft365.com) The UX problem: The M365 Copilot app presents a "+" button in the chat input area with options including "Upload" and "Take screenshot." Users naturally assume these options work. The file even appears as an attachment in the sent message — but the agent never receives it. There's no warning, error, or indication to the user that the attachment was silently dropped. This creates a misleading experience, particularly for end users who have no visibility into the channel behavior differences. What I've found so far: I'm aware this is documented as a known issue for custom engine agents in the Microsoft 365 Copilot Extensibility Known Issues page: "File attachments — Users can't upload files in agent chats and the agent can't return files for download." I also found a related GitHub issue (OfficeDev/microsoft-365-agents-toolkit #15325) where a Microsoft team member confirmed this is a "Copilot platform shortage" — not an Agents Toolkit issue — with no published ETA. My questions for the community and any Microsoft product team members: Is there any currently supported workaround to enable file/image input for a Copilot Studio agent running in the M365 Copilot app (desktop or web)? For example, any manifest configuration, agent settings, or alternate approach? Is this limitation being actively worked on? Is there a roadmap item or Microsoft 365 feature ID that can be tracked for when file attachment support is extended to custom engine agents in the M365 Copilot chat experience? Is the UI behavior (showing upload options that don't work) being addressed separately? Even if full file processing isn't ready, a visible warning or disabled state in the UI would significantly reduce user confusion. Any insight from others who have hit this — or from Microsoft PMs — is appreciated. Happy to share more configuration details if helpful. Thanks! Brian163Views2likes0CommentsCopilot Studio Agent resetting when processing PDF drawings (300MB+) via Claude 4.6 Sonnet
Hello everyone, I am building an automated drawing review verification agent inside Copilot Studio using the Claude 4.6 Sonnet model. The goal of the agent is to read a comments package (20-40MB) and verify if those design comments were successfully incorporated into a milestone drawing set (300MB–400MB). When testing this workflow natively within Claude, the model handles the token load perfectly and returns an accurate compliance/incorporation summary within approximately 20 minutes. However, when running the exact same agent setup within Copilot Studio, the conversational canvas repeatedly crashes and resets the session. I suspect I am hitting the 100-second synchronous conversational timeout or overloading the chat runtime payload limits due to the massive file sizes. Because of corporate compliance policies, this agent must live within our Microsoft tenant so it can be scaled across our operations team via Microsoft 365. How can I fix Copilot Studio to have its performance match Claude's, as it is utilizing the same agent model. I am fairly new to working with AI but am willing explore any avenue as if I can figure out a solution this will help save a lot of time for colleagues. Thanks in advance for any insights!141Views0likes2CommentsPlatform Issue: Agent-Level MCP Initialization Blocks All Topics for Users Without Connection Access
I’m writing to report a significant platform behavior issue we’ve encountered with MCP (Model Context Protocol) server initialization in Copilot Studio that is severely impacting end-user experience. ENVIRONMENT • Platform: Microsoft Copilot Studio • Deployment Channel: Microsoft Teams • Agent Type: Custom Agent (EdiSENSE DEV) • MCP Server: Atlassian Confluence-Jira MCP • Affected Users: Users without Confluence/Jira access entitlements PROBLEM SUMMARY When an MCP server is registered at the agent level in Copilot Studio, it appears to be initialized globally before any topic routing occurs. If a user’s MCP connection is in a Not Connected state — even if their query has absolutely nothing to do with that MCP server - the agent gets stuck in an authentication loop, repeatedly prompting: “Let’s get you connected first. Open connection manager to verify your credentials. Once the connection is ready, retry your request.” The user is shown Retry / Cancel buttons, and the agent never proceeds to route the query to the appropriate topic. STEPS TO REPRODUCE 1. Register an MCP server (e.g., Confluence-Jira) at the agent level in Copilot Studio 2. Deploy the agent to Microsoft Teams 3. Log in as a user who does NOT have access to the MCP-connected service (e.g., no Confluence license) 4. Ask the agent any question completely unrelated to the MCP service 5. Observe: Agent does not route to any topic. Instead, it loops on MCP connection prompt indefinitely EXPECTED BEHAVIOR • MCP initialization failure for a specific service should NOT block unrelated topic routing • The agent should gracefully degrade - if an MCP connection is unavailable for a user, it should skip that MCP and continue routing the query to relevant topics • There should be a way to scope MCP connections to specific topics, or at minimum, mark them as optional/non-blocking ACTUAL BEHAVIOR • Agent-level MCP initialization blocks the entire conversation flow • Users without MCP access cannot use ANY functionality of the agent, even features completely unrelated to the MCP service • There is no graceful fallback or bypass mechanism available to agent builders BUSINESS IMPACT This is a critical gap for enterprise deployments where: • Not all users have access to every integrated service • Agents serve a broad user base with varying entitlements • Admins have no control over MCP initialization order or failure handling In our case, a large portion of our Teams users are completely locked out of the agent’s core functionality simply because they don’t have a Confluence license - even though they never intended to use Confluence-related features. FEATURE REQUEST / SUGGESTED RESOLUTION 1. Allow MCP servers to be scoped at the topic level, not just agent level 2. Introduce an optional flag for MCP connections so initialization failure is non-blocking 3. Provide agent builders with a connection status condition node in the topic flow to handle MCP failures gracefully 4. At minimum, allow the Cancel button in the auth prompt to fall through to normal topic routing I’d appreciate any guidance on whether there is a current workaround, or if this is a known limitation on the roadmap for resolution. Thank you for your time and support.73Views0likes1Comment