Blog Post

Copilot Studio Blog
6 MIN READ

Automate agent evaluation with the Evaluation APIs

Efrat_Gilboa's avatar
Efrat_Gilboa
Icon for Microsoft rankMicrosoft
Apr 29, 2026

When you build an agent in Microsoft Copilot Studio, you want confidence that it behaves exactly as intended: answering correctly, using the right tools, and following the logic you designed. Agent Evaluation (generally available) provides this foundation by allowing you to define test sets, run them against your agent, and understand how it performs.

As agents evolve from experimentation into real production scenarios, this foundation becomes part of an ongoing process. Evaluation is no longer a one-time step, but a continuous part of the development lifecycle. Teams are looking to validate changes quickly, track quality over time, and ensure consistent behavior across updates, environments, and use cases.

To support this, evaluation scales alongside your agents. Automated evaluation enables teams to expand their testing coverage, run evaluations more frequently, and establish consistent quality signals across the lifecycle. It brings evaluation closer to the way modern systems are built: iterative, data-driven, and continuously improving.

To fully realize this at scale, evaluation integrates seamlessly into your workflows and systems.

Now, these same evaluation capabilities can be used programmatically through Power Platform REST API and your connectors. Here’s how you can use these Evaluation APIs to automate agent evaluation as part of your development and release workflows.

What you can do with the Evaluation APIs

The Evaluation APIs expose the core evaluation experience as programmable endpoints. Using those endpoints, you can trigger evaluations on demand, integrate evalutaions into pipelines and approval workflows, and design processes relying on the results. Whether you prefer a code-first approach with APIs or a low-code experience using Microsoft Power Automate flows and Copilot Studio agent workflows, you can easily automate when and how evaluations run – and use the results for quality gateway.

Here are the capabilities included in the Maker Evaluation API:

Capability

What it does

List test sets

Retrieve the test sets configured for your agent

Run a test set

Trigger a test set to execute against your agent

Poll run status

Poll a running evaluation to see when it completes

Retrieve results

Retrieve detailed results including per-test-case scores

List historical runs

List all previous evaluation runs for reporting or comparison

These APIs work with any HTTP client, Python scripts, Azure DevOps pipelines, GitHub Actions, or custom tooling. For teams working in the Power Platform ecosystem, the same actions are available through the Microsoft Copilot Studio certified connector, which integrates directly with Power Automate flows.

When to use Evaluation APIs

The Evaluation APIs exist so you can run evaluations without manually triggering them,  letting evaluation happen automatically as part of your pipelines, your flows, or your own tools. By default, runs evaluate the agent’s unpublished (draft) version, which makes this especially useful for CI/CD and pre-publish validation. The Copilot Studio UI is still the right place for one-off, interactive evaluation. Reach for the APIs when you want evaluation to happen on its own.

Here are three common scenarios.

1. Add evaluation to your CI/CD pipeline

When your agent source lives in a repository, every pull request and every merge to main is an opportunity to validate quality before changes reach production. Wire the Evaluation APIs into Azure DevOps, GitHub Actions, or any CI runner: each pipeline run triggers an evaluation, waits for the result, and passes or fails the build based on the score. Quality regressions are caught at PR time, not in production.

2. Trigger evaluation from a Power Automate flow

Many events that may affect agent quality happen outside Copilot Studio: a knowledge source is updated in SharePoint, a new article is added to a file library, a Dataverse record changes agent behavior. Use Power Automate (with the Microsoft Copilot Studio certified connector) to listen for these events and kick off an evaluation test run automatically, then route the results to Teams, email, or whichever channel your team watches.

3. Embed evaluation in your own tools

Sometimes you want evaluation as part of a tool you’re already building: a Center of Excellence dashboard tracking quality across many agents, an admin script that confirms every new agent has been evaluated before publish, or a custom integration that adds evaluation to an existing approval workflow. The APIs let you call evaluation programmatically from any system, with whatever logic fits your scenario.

How an evaluation run works through the API

The evaluation flow follows a simple pattern: TriggerPollGet Results.

  1. Trigger: Send a POST request to start an evaluation run for a specific test set
  2. Poll: Check the run status until it completes (the execution is asynchronous)
  3. Get results: Retrieve the score and detailed per-test-case outcomes

Optionally, you can pass an MCS Connection ID when triggering a run. This allows the evaluation to run using an authenticated user context, enabling access to tools and knowledge sources that require authentication. Without it, the evaluation will run anonymously.

Working with the Evaluation APIs: the key endpoints

Below are the core Evaluation API endpoints available today, starting with how to retrieve test sets and trigger evaluation runs programmatically.

Prerequisites

API Permissions.

  1. Go to https://portal.azure.com
  2. Go to App Registrations
  3. Search for your App
  4. Click API permissions
  5. Click Add a permission
  6. Click APIs my organization uses
  7. Search "Power Platform API"
  8. Click Delegated permissions
  9. Expand CopilotStudio
  10. Select MakerOperations.Read, MakerOperations.ReadWrite
  11. Click Add Permissions

 

Endpoint 1: Retrieve available test sets

Use this endpoint to list all evaluation test sets defined for a specific agent.

Request:

GET https://api.powerplatform.com/copilotstudio/environments/{yourEnvironment}/bots/{replaceWithYourCdsBotId}/api/makerevaluation/testsets?api-version=1

Expected result:
Returns the list of maker evaluation test sets associated with the agent.

Sample response:

Endpoint 2: Retrieve a specific test set

Once you have a test set ID, you can fetch its full definition.

Request

GET https://api.powerplatform.com/copilotstudio/environments/{yourEnvironment}/bots/{replaceWithYourCdsBotId}/api/makerevaluation/testsets/{yourTestSetId}?api-version=1

Expected result
Returns the full configuration and structure of the selected test set.

Sample response:

End point 3: Trigger an evaluation run

This endpoint allows you to programmatically start an evaluation run for a given test set.

The Body consists of a JSON object with the following attributes:

McsConnectionId - string value. If an empty string is provided, the evaluation runs anonymously, meaning tools and knowledge sources are not used. Agents that rely on authenticated connectors, actions, or auth‑gated knowledge sources will therefore produce different (likely worse) evaluation results.

RunOnPublishedBot - optional boolean value, defaults to false. Runs against the draft version (true runs against the published version).

EvaluationRunName - optional string value, useful for naming runs in dashboards.

Request

POST https://api.powerplatform.com/copilotstudio/environments/{yourEnvironment}/bots/{replaceWithYourCdsBotId}/api/makerevaluation/testsets/{yourTestSetId}/run?api-version=1

Body

{

“RunOnPublishedBot”: {boolean value},

"mcsConnectionId": "{yourMCSConnectionId}",

“evaluationRunName”: “{yourEvaluationRunName}”,{

}

Sample request:

Sample response:

Removed the note

How to obtain mcsConnectionId

  1. Go to: https://make.powerautomate.com
  2. Open Connections from the side menu
  3. Select the relevant Microsoft Copilot Studio connection
  4. Copy the connection ID from the URL

This connection ID will look something like:

https://make.powerautomate.com/environments/Default-00000000-0000-0000-0000-000000000000/connections/shared_microsoftcopilotstudio/shared-microsoftcopi-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/details

Note: One run at a time
 The API returns HTTP 422 if you try to start a run while another is already in progress for the same agent.

Endpoint 4: Get evaluation run status and results

After triggering a run, use the returned run ID to retrieve status and results.

Request

GET https://api.powerplatform.com/copilotstudio/environments/{yourEnvironment}/bots/{yourCdsBotId}/api/makerevaluation/testruns/{yourTestRunId}?api-version=1

Expected result
Returns the status and once completed, the evaluation results.

Sample response:

End point 5: List previous evaluation runs

This endpoint is useful for tracking trends, building dashboards, and supporting automated decision logic.

Request

GET https://api.powerplatform.com/copilotstudio/environments/{yourEnvironment}/bots/{yourCdsBotId}/api/makerevaluation/testruns?api-version=1

Expected result
Returns an array of previous evaluation runs, each with the same schema as the run details API.

Sample response:

 

Start using the Evaluation APIs today

Pick a test set, call the API, and see what your agent scores. That first run gives you a baseline. From there, you can automate evaluations into your workflow, set thresholds, and build the checks that make sense for your team. The APIs are available now. Start simple, and build from there.

Sign into Copilot Studio to get started today.

Updated Apr 29, 2026
Version 1.0
No CommentsBe the first to comment