Building Durable and Deterministic Multi-Agent Orchestrations with Durable Execution

greenie-msft

Microsoft

May 19, 2025

Explore how Durable Execution should be leveraged to achieve efficient and deterministic multi-agent orchestration, ensuring predictable outcomes and enhancing the resiliency of agentic workflows

Durable Execution

Durable Execution is a reliable approach to running code, designed to handle failures smoothly with automatic retries and state persistence. It is built on three core principles:

Incremental Execution: Each operation runs independently and in order.
State Persistence: The output of each step is durably saved to ensure progress is not lost.
Fault Tolerance: If a step fails, the operation is retried from the last successful step, skipping previously completed steps.

Durable Execution is particularly beneficial for scenarios requiring stateful chaining of operations, such as order-processing applications, data processing pipelines, ETL (extract, transform, load), and as we'll get into in this post, intelligent applications with AI agents. Durable execution simplifies the implementation of complex, long-running, stateful, and fault-tolerant application patterns. Technologies like Durable Functions provide a programming model that makes the implementation of these patterns straightforward. Some common stateful application patterns that require stateful chaining and are easily implemented with durable execution, like Durable Functions include:

Stateful app patterns easily implemented with Durable Functions

Durable Task Programming Model

Simple .NET function changing example demonstrating the durable task programming model

Simple Python function changing example demonstrating the durable task programming model

Before solutions like Azure Durable Functions, developers had to manually coordinate operations and maintain state using infrastructure like message queues and state stores, adding complexity to the code and increased the operational maintenance burden. Durable Functions streamlines this process by providing a programming model backed by a durable state store, enabling developers to define a series of steps to be executed in a specific order. This is called an orchestrator function. Activity functions within the orchestration function are the "steps," and the durable task runtime ensures each step is scheduled in order and executed on your compute of choice, with outputs persisted.

Durable for Orchestrating Agents

With the rapid advancements in AI, we are witnessing an increasing trend of scenarios that require orchestration, specifically when it comes to working with multiple AI agents within applications. These agents often work together to accomplish a larger task. Two emerging designs for these applications are deterministic agentic workflows and self-directed agentic workflows:

Deterministic Agentic Workflows: Agents work together through a series of predefined steps to accomplish a larger task, leading to a deterministic result.

A Deterministic Agentic Workflow orchestrates a series of predefined steps, each calling sub-agents to achieve a deterministic outcome.

Self-Directed Agentic Workflows: Agents dynamically explore and determine the workflow plan as they proceed.

Self-directed Agentic Workflow determining the sequence of steps during execution.

Each approach fits different business scenarios and requirements. However, as we're learning, many scenarios benefit from deterministic outcomes, and durable execution truly shines in the deterministic agentic workflow pattern. It excels at providing efficient and reliable deterministic outcomes by following a predefined set path that maps to orchestration and activity functions. The programming model makes it extremely easy to call your agents independently and implement common agent app patterns, such as prompt chaining for function chaining and parallelization with fan-out/fan-in. For more on this, please reference this insightful blog post by my colleague Chris Gillum.

Self-directed agentic workflows are advantageous for unpredictable, creative tasks where the agents can determine their plan during execution. However, this can be less efficient and lead to non-deterministic outcomes, which may cause undesirable results.

When using durable execution for your agent orchestration, it enhances the resiliency of your agentic workflows. If any step fails, there’s no need to start from the beginning. Given that requests to LLMs can be expensive and may yield different outcomes, durable execution ensures that your orchestrations can recover right from their last success point.

Let’s look at a specific example of where I used durable execution, specifically Azure Durable Functions to implement a multi-agent application that requires durability – The Travel Planner Assistant.

The Travel Planner Assistant

Travel planning inherently follows a structured sequence – selecting destinations, crafting itineraries, gathering local insights, and booking the trip. This makes it ideal for an agentic workflow with predefined steps, rather than a self-directed agentic workflow with exploration. The outcome must be deterministic – we want a complete travel itinerary and a fully booked trip.

The application exposes a durable function that schedules a predefined agentic workflow (orchestration) to create a travel plan, which will then be used to book the trip. The orchestration interacts with specialized sub-agents for the first three steps. These include:

Destination Recommender Agent: Provides global knowledge across thousands of locations.
Itinerary Planner Agent: Creates a daily itinerary based on a deep understanding of the specific location’s logistics and seasonal considerations.
Local Recommendations Agent: Offers popular attractions to visit.

Orchestration Activity Function Calls - Sequential AI agent activities. Each activity is executed as a separate function with its own context.

By using Durable Functions to coordinate these specialized agents, the travel planner agent creates a more accurate and comprehensive travel plan than a single generalist agent. Once the travel plan has been created, Durable Function orchestrations provide built-in support for human interaction, allowing human approval of the travel plan before proceeding to book the trip. This can be crucial in some scenario because, despite the advancements in agents and LLMs, there are still critical tasks that require human input. Relying solely on LLM decision-making without review for such important task can be risky, and human approval ensure accuracy and reliability. Seeking this approval can be a long-running operation that may encounter failures along the way. However, by leveraging Durable Functions, the application benefits from resiliency through built-in state persistence, ensuring the orchestration can resume in the event of a failure, such as downstream dependency outage or if the application restarts while waiting for approval.

Demo Video

Travel Assistant Demo Video

Wrap up

For orchestrating agents, I recommend using Durable Execution technologies like Azure Durable Functions, as they offer determinism, reliability, and efficiency. The programming model simplifies the orchestration of agents, ensuring predictable outcomes. It enhances the resiliency of agentic workflows, allowing them to recover seamlessly from their last successful point.

To provide evidence of customers using Durable in real-world production applications, take a look at this Toyota case study where they are using Durable Functions for orchestrating their multi-agent application, exactly as outlined above.

If you have any questions or thoughts about this, please feel free to comment below. I'd love to hear if you find this interesting or if you're already using durable execution in your agent applications.

Updated Jun 16, 2025

Version 2.0

azure functions

durable functions

greenie-msft

Microsoft

Joined May 23, 2022

View Profile

Apps on Azure Blog

Follow this blog board to get notified when there's new activity