Blog Post

Excel Blog
7 MIN READ

Building Agent Mode in Excel

tmobrien's avatar
tmobrien
Icon for Microsoft rankMicrosoft
Sep 29, 2025

How we designed an intelligent spreadsheet agent by combining advanced reasoning with Excel’s dynamic calculation engine

Excel is the world’s most trusted canvas for working with data, powering everything from household budgets to Fortune 500 companies, scientific research, operational planning, and classroom learning. It’s where millions turn to think, plan, and build. Agent Mode takes that impact even further, unlocking expert-level capabilities and making advanced analysis, modeling, and automation approachable for everyone, across every domain.

Agent Mode lets you describe a task in natural language and then works with you to plan, reason, iterate, and validate the outcome. After introducing Copilot in Excel, it quickly became clear that our users wanted more — richer insights and more direct action on the sheet. Agent Mode aims to deliver on these expectations with a resilient experience that works across domains and data shapes, taking meaningful action directly in your workbook.

We’ve developed Agent Mode to take advantage of the full richness of Excel artifacts, including table structures, formula syntax, dynamic arrays, PivotTables, charts, and more. It can create workbooks that are refreshable, auditable, and verifiable. This leap is powered by advances in our reasoning engine and the deeper expression of Excel as a rich modeling language. These breakthroughs allow Agent Mode to not only generate and execute solutions but also evaluate results, fix issues, and repeat the process until the outcome is verified.

We evaluated Agent Mode on the complete set of 912 SpreadsheetBench instructions and obtained an accuracy rate of 57.2%. In our testing environment, Agent Mode makes direct workbook modifications via Excel APIs in a JavaScript runtime. We measure accuracy using the script provided by the SpreadsheetBench authors that grades output using the open-source openpyxl library. For evaluation on Claude and Shortcut.aI, we manually ran the SpreadsheetBench tasks (including answer location information needed for reliable evaluation) and downloaded the Excel files that were produced. These downloaded files were then graded using the same evaluation script provided by the SpreadsheetBench authors. Note that our evaluation with Claude completed on 895 of 912 instructions. Accuracy numbers were calculated using only completed tasks. All OpenAI benchmark results were originally published by OpenAI here.

We measure Agent Mode on both our internal evaluation sets and the public SpreadsheetBench benchmark. Our results on SpreadsheetBench place Agent Mode at the leading edge of current systems, accurately completing 57.2% of the benchmark’s tasks.

But we want to be clear: we don’t optimize for benchmarks, we optimize for real user jobs in Excel. That means solving messy, ambiguous, and complex tasks that reflect how people actually work. And while SpreadsheetBench is a strong signal, it doesn’t capture everything that makes Excel powerful — like dynamic arrays, PivotTables, charts, and formatting — or the customer need for refreshable, auditable, and verifiable solutions. That’s why we have also developed internal evaluation sets, AI grading, and user feedback loops to guide improvements.

We also acknowledge that we have plenty of room for improvement, particularly around things like formatting and presentation-worthy layouts. But we believe our foundation is strong, and the direction is clear: Agent Mode is here to make Excel more powerful, more intuitive, and more helpful than ever before.

Designing an Intelligent Spreadsheet Agent

At the center of Agent Mode is a reasoning and reflection loop — powered by the latest generation of advanced reasoning models — that can interact directly with Excel workbooks. Rather than jumping straight into action, our system generates model-ready context from a given workbook and leverages an advanced reasoning model to begin planning for a given task. The system then interacts with the workbook by writing and executing code to carry out that plan, reflecting on the results, and evaluating whether the outcome matches the intent. If gaps remain, the loop continues: revising the strategy, pulling in additional context, and exploring alternative approaches. This cycle of planning, execution, and reflection continues until the system determines the task is complete. By combining planning with reactivity, the agent can chart a path, adjust when needed, and ultimately deliver solutions that feel intentional and well thought out. 

The reasoning engine of our system architecture is model-agnostic by design, allowing for rapid integration of new models as they become available. Loose coupling between our reasoning and workbook interaction layers allow us to quickly swap in and evaluate new models.

An overview of our Agent Mode system architecture.

Managing spreadsheet context

Excel workbooks are living systems. They're often large, constantly changing, and filled with rich objects like PivotTables, slicers, and charts. For an agent, trying to absorb every detail all at once is simply impractical. Passing the entire dataset into context, along with the metadata for every object, would overwhelm any current model. Even exposing the thousands of read APIs Excel provides is far too heavy-handed. Instead, the agent approaches the workbook strategically: it pulls in just the pieces of context it needs, when it needs them, navigating the complexity step by step. This makes the agent not just a passive processor of data, but an active explorer of your workbook’s inner workings.

To enable this selective exploration, we’ve developed a document context producer that operates within a coordinated push-and-pull system. On the push side, the document context producer proactively sends a compact “blueprint” of the workbook along with the user’s prompt — a summary of spatial layout, values, objects, and the formula dependency graph — encoded as JSON for complex objects and Markdown for tabular data. When deeper inspection is required, the reasoning engine can then request and pull additional information on demand, ensuring it can always operate with the context it needs.

This hybrid design balances completeness with efficiency and lays the foundation for future improvements around caching, indexing, and search that will make context retrieval faster and more robust.

Our Document Context Producer translates complex workbooks into model-ready context.

Engineering domain knowledge of Excel

Managing context gives the agent a clear view of the workbook. The next challenge is action: knowing which of Excel’s thousands of functions and APIs to call to get the job done.

Excel spans thousands of API controls, including formulas, objects, and advanced features — a surface far too large for any current model to memorize or control directly. Instead of brute-forcing that complexity, we built distilled documentation into our reasoning engine — a compact, structured reference of Excel functions, objects, and specialized tool calls. Agent Mode can draw on this distilled knowledge to execute sophisticated tasks like building PivotTables, charts, slicers, and financial models.

By embedding only the essential information, the model gains expert-level fluency in Excel’s internal workings without overwhelming its context window, enabling accurate reasoning across the full feature set of the application.

Validation-driven generation

In developing and evaluating our core coding and reflection loop, we observed that many spreadsheet errors are silent — formulas return values, but subtle mistakes remain hidden until they cascade into bad analysis. Relying on a single execution step is risky when the goal is trustworthy automation.

To counter this, Agent mode in Excel reframes each tool call as an auditable, verifiable workflow. Before executing an action, our reasoning engine first generates lightweight tests to establish expected outcomes. These checks act as verifiable guardrails, ensuring that each step can be inspected and reproduced. Crucially, rather than hardcoding values, Agent Mode carries out all computations directly on the grid. This preserves the full dependency structure of the spreadsheet, allowing users to audit intermediate results, trace formulas, and verify correctness at every stage.

Across our quantitative evaluations, we have been able to drive double-digit accuracy improvements with this validation-infused approach.

 

An illustrative example of our validation-infused approach, with several interleaved “testing” steps in Agent Mode’s chain of thought.

Scaling quality with AI graders

As we evolve Agent Mode into a deeply integrated, context-aware companion for data workflows, AI graders have emerged as one of the most critical technical enablers driving quality, trust, and usability. They serve not only as evaluators of accuracy but also as definers of excellence—ensuring that results are not just correct, but also useful, complete, relevant, and delightful.

Graders are the mechanism through which we translate abstract quality goals into measurable, actionable standards. In Agent mode, they underpin both offline evaluation pipelines and live user experience metrics, helping us answer key questions like:

  • Did Agent Mode fulfill the user’s intent?
  • Was the output accurate and verifiable?
  • Did the result feel native to Excel?
  • Was the experience satisfying and accessible?

Without graders, we would risk optimizing for superficial metrics — like response time or token count—while missing the deeper signals of user success.

Looking ahead

An early preview of Agent Mode in Excel is available starting today via the Frontier program for Microsoft 365 Copilot licensed customers and Microsoft 365 Personal or Family subscribers (under the Microsoft Services Agreement). Agent Mode works in Excel on the web and is coming soon to desktop. To try it, install the Excel Labs add-in and choose Agent Mode (Frontier).  Learn more about it in our announcement blog.

This preview is just the beginning of our journey. We’re continuing to build a complete, M365 integrated experience that is trustworthy, reliable, and transparent — one that you can depend on for critical work. And from a developer perspective, we’re exploring extensibility solutions that would allow customers and partners to build custom solutions on top of our Agent Mode capabilities.

Over the coming weeks and months, we plan to fully integrate and iterate on this experience across all Excel clients. We’ll continue to improve core output quality, refine the Agent Mode interfaces in chat and on the grid, and incorporate user feedback to ensure the experience feels at home in Excel, while unlocking entirely new ways to model, analyze, and automate.

Updated Sep 29, 2025
Version 1.0
No CommentsBe the first to comment