Empowering sellers with a smarter way to prepare — chat naturally, uncover insights, and walk into meetings ready to win.
Sales teams rely on timely, contextual insights to prepare for customer meetings, manage complex accounts, and stay on top of their pipelines. Yet the information they need lives across multiple systems, CRM platforms, email threads, meeting notes, internal conversations, documents, and more. Sales agent in Microsoft 365 Copilot brings structure to this complexity by unifying CRM context and work data into one conversational, sales-savvy assistant.
To understand how well the Sales agent performs in real sales workflows, we designed a comprehensive evaluation that mirrors how sellers work. Our goal was simple: measure whether Sales agent can help reps ramp up faster, prepare more effectively, and make better decisions.
Today, we’re announcing the Microsoft Sales Bench—a new collection of evaluation benchmarks designed to assess the performance of AI-powered sales agents across real-world scenarios. This framework brings together purpose-built metrics, hundreds of sales-specific scenarios, and composite scoring validated by both human and AI judges.
The Sales Bench isn’t starting from scratch. It now formalizes and expands what began with the Sales Research Bench, published on October 21, 2025, which evaluates how AI solutions answer business research questions for sales leaders.
We’re extending the Microsoft Sales Bench with two additional benchmarks: the Microsoft Sales Qualification Bench, focused on measuring how effectively AI agents qualify leads and generate high-quality pipeline and the Microsoft Sales agent Bench, focused on how effectively AI agents help sellers learn about customer accounts, prepare for meetings, and review and stay up to date on pipelines.
This technical deep dive outlines how Sales agent is built, how it reasons over enterprise CRM and work data, and how we evaluated its accuracy, completeness, relevance, and usefulness across end-to-end sales workflows. We also detail the benchmark configurations, evaluation dataset, metrics used, and results comparing Sales against ChatGPT (GPT-5) baselines.
Sales agent architecture
Sales agent is built on a modular architecture designed to help organizations unlock the full value of their existing data while leveraging Microsoft’s AI ecosystem in a secure and compliant manner. It’s design centers on a unified, agentic back-end that extracts sales-specific insights from a wide range of data sources and seamlessly integrates them with systems of record and existing business workflows. This AI-powered back-end fuels end-user experiences embedded directly into the seller’s flow of work across Microsoft 365.
Unified and agentic back-end
At the core of Sales agent is the Unified Sales agent back-end, which powers all AI-driven seller experiences. This layer delivers several architectural advantages that enable deeper insights, higher accuracy, and richer relevance for sellers, without requiring users to craft the perfect prompt:
- Autonomous agents generate insights:
By processing emails, meetings, and public data through autonomous background agents, Sales agent can perform deeper analysis and generate comprehensive insights from customer and public data persisted in a customer-owned Insights Data Store. With deep integration into Microsoft 365, this processing happens as soon as source data becomes available, immediately when an email arrives or when a meeting ends and transcripts finalize. - Sales-specific IP helps improve the relevance of insights:
Sales agent leverages a robust set of proven, battle-tested sales-specific models and system instructions. This ensures all generated insights are tailored for seller workflows and eliminates the need for users to craft or iterate on prompts to get the results they need. - CRM record mapping improves accuracy, completeness, and groundedness of insights:
All seller-specific insights are automatically and intelligently mapped to CRM records. This significantly improves accuracy of the insights that are retrieved for contacts, accounts, and opportunities sellers work with every day, and makes insights derived from emails and meetings broadly accessible across the organization. The result: each seller benefits from a much wider knowledge base that is shared across the sales organization in a secure and compliant manner. - Unified data layer supports secure and compliant data access:
Sales agent relies on existing authentication and access-control mechanisms through a unified data access layer, with newly generated insights and CRM mappings stored in the customer-owned Insights Store. This helps ensure all data retrieval and processing occurs in a compliant manner using familiar enterprise security controls, while giving customers full control over their data.
Integrated end user experiences
Sales agent consists of chat and guided non-chat seller experiences that are integrated directly into existing Microsoft 365 Applications. The chat experiences in Sales agent offer natural language access to organizational knowledge, CRM data, and AI-generated insights. When a user submits a prompt, Sales agent leverages the latest innovation in the Microsoft 365 Copilot orchestrator pipeline, which securely grounds the request in the user’s CRM context, retrieves relevant Microsoft 365 signals, applies enterprise policies, and selects the best AI capabilities to generate a tailored response. Because seller-specific context is incorporated directly during grounding and retrieval, outputs remain personalized and actionable.
In addition to the chat experiences, Sales agent also provides a set of guided, purpose-built non-chat experiences designed around common seller tasks. Delivered directly inside Microsoft 365 Copilot, these experiences integrate smoothly into the seller’s flow of work and allow effortless transitions between conversational interactions and structured workflows.
The existing architecture of these end user experiences provides several key advantages:
- Microsoft 365 application context helps improve relevance and accuracy of insights: Smoothly integrated into existing Microsoft 365 applications, Sales agent leverages contextual data from the flow of work and customer relevant contact information from M365 data sources to fetch the most relevant insights without the need for sellers to provide complex prompts or dig for the right data across a wide range of tools.
- Microsoft 365 Copilot Orchestrator maximized conversationality of the seller experience: Sales agent leverages the Microsoft 365 Copilot Orchestrator for all chat specific scenarios, which puts the latest Microsoft innovation in natural language processing and AI at the seller’s fingertips and reduces the time it takes for sellers to get the exact data they seek.
- The Unified and Agentic Sales agent back-end improves consistency of end user experiences: Existing AI and natural language tools are prone to generate inconsistent and varying results, which increases the noise level and can be daunting when sellers are looking for deterministic and precise information for their customer engagements. The unified Sales agent back-end ensures consistency and predictability in the insights surfaced in chat and guided experiences without jeopardizing relevance or accuracy.
For additional details please view the Copilot orchestrator documentation.
Evaluation methodology
To make sure the evaluations are as close to real life scenarios as possible, we built a realistic testbed that reflects the complexity of modern sales organizations. This allowed us to evaluate the agent end-to-end, from query understanding to grounded, actionable output.
A realistic sales environment — built from the ground up
To evaluate the Sales agent in a way that reflects how real sellers work, we built a full Dynamics 365 Sales environment populated with synthetically generated, but realistic CRM data. Instead of relying on simplified test records or isolated samples, we recreated the rich, interconnected ecosystem a sales team operates in every day.
Our synthetic environment included:
- Complete account hierarchies that mirror real customer organizations
- Opportunities at various stages, with lifecycles that unfold over weeks or months
- Active and historical cases tied to real customer scenarios
- Contacts with seller assignments and defined roles, representing multi-stakeholder engagement
- Email threads between sellers and customers, capturing back-and-forth communication
- Teams meeting transcripts, reflecting ongoing conversations, sentiment, and follow-up items
Every synthetic record was designed to mimic real seller workflows, multiple stakeholders per account, sequential interactions, and detailed notes. We modeled the types of data patterns, relationship structures, and activity histories that sellers depend on to prepare for meetings, understand their customers, and manage their pipelines.
This environment gave us a comprehensive and realistic foundation to test the Sales agent end-to-end: from ramping up on accounts, to preparing for meetings, and generating insights to help sellers take the right next actions.
Queries inspired by real sellers
With the dataset ready, we constructed a large query set derived from:
- Actual seller questions gathered during private preview
- Variants and perturbations of those queries to test robustness
- Coverage across all major sales scenarios
In total, we evaluated Sales agent on 547 questions that sellers routinely ask when preparing for meetings, reviewing pipelines, or getting up to speed on accounts.
For every query, we created a ground truth answer based on CRM and work data so we could measure accuracy and completeness precisely.
Benchmark configurations
To ensure a fair comparison between the Sales agent and ChatGPT, Microsoft designed a controlled evaluation environment where both systems received the same questions and operated over the same underlying data. ChatGPT was evaluated using an OpenAI Pro license with GPT-5 in Auto mode, chosen because the Pro tier offers “full access to the best of ChatGPT,” to help ensure high-quality responses without throttling.
ChatGPT was also configured with access to Teams, Outlook, and Calendar, reflecting a typical productivity assistant setup. This allowed it to draw on unstructured work context such as meeting summaries, emails, and schedules.
Because ChatGPT cannot directly access Dynamics 365 or Dataverse, we mirrored the CRM data into an Azure SQL instance. This mirroring preserved all data types, primary keys, foreign keys, and table relationships, creating a dataset structurally identical to what the Sales agent accessed natively. The Azure SQL copy was then exposed through the MCP SQL connector so ChatGPT could retrieve the same information through a standardized external interface aligned with its architecture.
All systems were evaluated using the same question set, the same dataset, and the same evaluation rubric. Responses from Sales agent and ChatGPT were judged using the same LLM evaluators and human verification, ensuring consistent scoring and minimizing setup-related bias.
Evaluation metrics
We evaluated response quality across six dimensions that reflect what sellers care about in day-to-day workflows:
|
Metric |
Definition |
|
Accuracy |
Measures how well the response matches the ground truth in correctness and coverage; penalizes contradictions and major omissions. |
|
Groundedness |
Ensures the response is strictly supported by the provided ground truth/reference, avoiding unsupported or hallucinated details. |
|
Completeness |
Assesses whether the response fully covers all essential aspects of the query/task without missing key details. |
|
Relevance |
Checks that the response stays on-topic and aligned with the user’s intent and context, avoiding unrelated content. |
|
Noise Sensitivity |
Evaluates robustness to minor, irrelevant variations in the input; a robust system gives consistent answers despite small “noise.” |
|
Conversationality |
Rates clarity, readability, natural flow, and tone; ensures structure and wording are appropriate for the intended audience. |
Each metric is scored independently on a 0–10 scale, where higher scores indicate stronger performance. We used an LLM-as-a-judge approach to score outputs against the ground truth and rubric and manually reviewed a sampled subset of evaluations to calibrate the judges and validate scoring consistency. To reduce judge variance and mitigate hallucination risk, each sample was evaluated five times, and the mode across runs was recorded as the final score.
Evaluation results
To understand how well the Sales agent performs across real sales workflows, Microsoft evaluated it on 547 total prompts spanning account research, meeting preparation, meeting insights, and CRM data Q&A scenarios. Each system, Sales agent and ChatGPT, received the exact same questions and operated over the same mirrored dataset.
Across every high-impact seller workflow, Sales agent delivered more accurate, complete, and relevant responses based on our defined rubrics outlined below, particularly in scenarios that require deep CRM grounding and fusion of structured and unstructured context.
When averaged across all prompts, Sales agent outperformed ChatGPT and most notably in accuracy, relevance, and completeness, three dimensions that directly impact seller productivity.
Sales agent consistently produced more accurate, complete, and contextually relevant responses than ChatGPT, especially in the workflows that matter most to sellers - meeting insights, meeting prep, and account ramp-up. Sales agent excelled in interpreting and extracting insights from multi-turn meeting histories, an area where fusing CRM context with unstructured meeting transcripts is critical. ChatGPT scores lower in large part because the Teams connector in ChatGPT lacks access to meeting transcripts.
Where ChatGPT offers strong general reasoning, the Sales agent brings CRM awareness, enterprise grounding, and workflow-specific intelligence that elevate seller productivity and readiness in a way general-purpose assistants did not match.
Sample prompts and responses
Below are sample questions and responses from Sales agent from the evaluations for the different scenarios. All data in the sample responses are synthetic. References to real companies, products or people, and associated data are simulated. No affiliation, endorsement, or validation by the referenced companies should be inferred.
Insights from past customer meetings
Question
Based on meetings that happened in October and November 2025, What product information has Contoso shared with Ford Motor Co?
Sales agent response
Between October and November 2025, Contoso shared the following product information with Ford Motor Co during multiple meetings:
🧠 Product discovery meeting
Date: 2025-10-27 07:00 UTC
Key product information shared:
- Charging intelligence platform:
- Integrates with existing charger hardware via API.
- Aggregates telemetry data from multiple vendors.
- Uses AI for predictive failure detection and load optimization.
- Offers energy cost forecasting and smart charging windows, potentially saving 15–20% on energy.
- Provides advanced analytics linking charger usage and site profitability to vehicle sales.
- Scales to monitor over 2,500 charging points globally.
- Customer interest & next steps:
- Ford expressed interest in data standardization, ROI analysis, and usage pattern correlation.
- Agreed to share sample telemetry data after signing an NDA.
- Planned a follow-up session with a solution architect to demonstrate platform capabilities. [Product Discovery], [Product Discovery]
🔍 Technical demo & deep dive
Date: 2025-10-28 07:00 UTC
Key product information shared:
- Predictive maintenance solution:
- AI models predict component failures 7 days in advance.
- Alerts integrate with Ford’s existing maintenance systems.
- Data processed within Ford’s private tenant—no telemetry leaves their network.
- Achieves 20–25% reduction in unscheduled repairs and fewer truck dispatches.
- Energy optimization:
- Energy optimizer forecasts load to shift charging to off-peak hours.
- Estimated savings of $9,000/month in the test region.
- Uptime visibility:
- Standardized APIs across vendors.
- Combined heat map dashboard to address access and visibility issues.
- Next steps:
- Pilot deployment at 450 sites planned for Q2 2026, pending procurement review.
- Sellers to prepare ROI and TCO documentation for Ford’s procurement team. [Technical...Deep Dive]
Would you like a summary of the action items or objections raised during these meetings as well?
Questions on CRM data
Question:
Get me all accounts in the Consulting vertical. Show the results as a table with a row for each account. For each account, show information about the account name, number of employees, primary contact, annual revenue from each account and total estimated revenue from open opportunities as columns. Sort the table by revenue from open opportunities.
Sales agent response
Here’s the plan to fulfill your request:
- Query CRM for all accounts in the Consulting vertical using the QnA action.
- For each account, retrieve:
o Account Name
o Number of Employees
o Primary Contact
o Annual Revenue
o Total Estimated Revenue from Open Opportunities
- Sort the results by Total Estimated Revenue from Open Opportunities in descending order.
- Present the data in a table format with one row per account.
I’ll start by querying the CRM now. Here are the accounts in the Consulting vertical, sorted by Total Estimated Revenue from Open Opportunities (highest first):
|
Account Name |
Employees |
Primary Contact |
Annual Revenue |
Revenue from Open Opportunities |
|
Omnilytix Data |
360 |
Aisha Tan |
$22,500,000 |
$2,730,000 |
|
Quantiva Labs |
320 |
Not Available |
$15,800,000 |
$2,360,000 |
|
Zyphoria Retail |
1,050 |
Not Available |
$74,000,000 |
$2,340,000 |
|
Xelentra Legal |
140 |
Jamie Young |
$9,200,000 |
$2,180,000 |
|
Yum China Holdings |
540 |
Anjali Rao |
$28,000,000 |
$2,060,000 |
|
Alpine Ski House |
4,000 |
Hester de Villiers |
$876,675.61 |
$1,671,249 |
|
CVS Health Corp |
2,000 |
Helena Botha |
$269,358 |
$1,527,713 |
|
Bellows College |
5,000 |
Jarmila Konecna |
$514,777 |
$1,462,140 |
|
Adventure Works Cycles |
3,000 |
Louis de Wet |
$771,861 |
$1,427,064 |
|
Boulder Innovations |
1,200 |
Laura Melgar |
$241,210 |
$1,399,709 |
|
Virexa Pharmaceuticals |
1,480 |
Carla Fernandes |
$115,000,000 |
$0 |
|
Clariwave Communications |
290 |
Carla Mendez |
$13,900,000 |
$0 |
Next steps:
Would you like me to export this table to Excel, or create a chart (e.g., bar chart showing revenue from open opportunities)? Or should I also include account URLs if available?
Ramp up on custom accounts
Question #115
Get me the summary of Apple. Show the opportunities as a table and sort them by close date.
Sales agent response
Here’s the summary for the Apple account along with its opportunities sorted by close date:
🧾 Account summary: Apple [Apple]
- Location: 4567 Main St Buffalo, Germany
- Industry: Professional Services (Legal, Accounting)
- Employees: 85
- Revenue: €4,300,000
- Primary Contact: Abigail Rodriguez, Chief Revenue Officer
- Other Key Contacts:
- Max Hoffmann, Chief AI Officer
- Sophie Klein, Director of Technology
- Mila Schröder, Director of Finance
- Mason Smith, VP of Sales
- Recent activity:
- No recent meetings recorded
- Active support cases include billing discrepancies, ERP integration errors, mobile app crashes, and password reset issues—all managed by Mina Amiri
📊 Opportunities for Apple (Sorted by Close Date)
|
Name |
Stage |
Estimated Value |
Close Date |
Owner |
Link |
|
Finance Automation Platform |
Pipeline |
$760,000.00 |
11 Oct 2025 |
Mina Amiri |
[Finance Au...n Platform] |
|
HR Digital Transformation |
Best case |
€820,000.00 |
12 Nov 2025 |
Mina Amiri |
[HR Digital...sformation] |
|
Business Process Optimization |
Committed |
€910,000.00 |
4 Feb 2026 |
Mina Amiri |
[Business P...timization] |
|
Procurement Intelligence Engine |
Best case |
€690,000.00 |
3 Oct 2026 |
Mina Amiri |
[Procuremen...nce Engine] |
Would you like a breakdown by forecast category or help preparing for a meeting with Apple?
Get started with Sales agent now!
- Sales agent - the Copilot agent that speaks sales
- Overview of Sales agent in Microsoft 365 Copilot (preview) | Microsoft Learn
- Set up Sales agent in Microsoft 365 Copilot (preview) | Microsoft Learn
- Use Sales agent in Microsoft 365 Copilot (preview) | Microsoft Learn
- Prepare for your sales meetings in Microsoft 365 Copilot (preview) | Microsoft Learn