Apps on Azure Blog

6 MIN READ

OpenAI Agent SDK Integration with Azure Durable Functions

greenie-msft

Microsoft

Sep 25, 2025

Make your agents resilient to failures and interruptions, so they remain reliable in production environments and critical business scenarios.

Picture this: Your agent authored with the OpenAI Agent SDK is halfway through analyzing 10,000 customer reviews when it hits a rate limit and dies. All that progress? Gone. Your multi-agent workflow that took 30 minutes to orchestrate? Back to square one because of a rate limit throttle.

If you've deployed AI agents in production, you probably know this frustration first-hand. Today, we're announcing a solution that makes your agents reliable: OpenAI Agent SDK Integration with Azure Durable Functions. This integration provides automatic state persistence, enabling your agents to survive any failure and continue exactly where they stopped. No more lost progress, no more starting over, just reliable agents that work.

The Challenge with AI Agents

Building AI agents that work reliably in production environments has proven to be one of the most significant challenges in modern AI development. As agent sophistication increases with complex workflows involving multiple LLM calls, tool executions, and agent hand-offs, the likelihood of encountering failures increases.

This creates a fundamental problem for production AI systems where reliability is essential.

Common failure scenarios include:

Rate Limiting: Agents halt mid-process when hitting API rate limits during LLM calls
Network Timeouts: workflows terminate due to connectivity issues
System Crashes: Multi-agent systems fail when individual components encounter errors
State Loss: Complex workflows restart from the beginning after any interruption

Traditional approaches force developers to choose between building complex retry logic with significant code changes or accepting unreliable agent behavior. Neither option is suitable for production-grade AI systems that businesses depend on and that’s why we’re introducing this integration.

Key Benefits of the OpenAI Agent SDK Integration with Azure Durable Functions

Our solution leverages durable execution value propositions to address these reliability challenges while preserving the familiar OpenAI Agents Python SDK developer experience. The integration enables agent invocations hosted on Azure Functions to run within durable orchestration contexts where both agent LLM calls and tool calls are executed as durable operations.

This integration delivers significant advantages for production AI systems such as:

Enhanced Agent Resilience- Built-in retry mechanisms for LLM calls and tool executions enable agents to automatically recover from failures and continue from their last successful step
Multi-Agent Orchestration Reliability- Individual agent failures don't crash entire multi-agent workflows, and complex orchestrations maintain state across system restarts
Built-in Observability- Monitor agent progress through the Durable Task Scheduler dashboard with enhanced debugging and detailed execution tracking (only applicable when using the Durable Task Scheduler as the Durable Function backend).
Seamless Developer Experience- Keep using the OpenAI Agents SDK interface you already know with minimal code changes required to add reliability
Distributed Compute and Scalability – Agent workflow automatically scale across multiple compute instances.

Core Integration Components:

These powerful capabilities are enabled through just a few simple additions to your AI application:

durable_openai_agent_orchestrator: Decorator that enables durable execution for agent invocations
run_sync: Uses an existing OpenAI Agents SDK API that executes your agent with built-in durability
create_activity_tool: Wraps tool calls as durable activities with automatic retry capabilities
State Persistence: Maintains agentic workflow state across failures and restarts

Hello World Example

Let's see how this works in practice. Here's what code written using the OpenAI Agent SDK looks like:

import asyncio
from agents import Agent, Runner

async def main():
    agent = Agent(
        name="Assistant",
        instructions="You only respond in haikus.",
    )
    
    result = await Runner.run(agent, "Tell me about recursion in programming.")
    print(result.final_output)

With our added durable integration, it becomes:

from agents import Agent, Runner

@app.orchestration_trigger(context_name="context")
@app.durable_openai_agent_orchestrator  # Runs the agent invocation in the context of a durable orchestration
def hello_world(context):
    agent = Agent(
        name="Assistant",
        instructions="You only respond in haikus.",
    )
    
    result = Runner.run_sync(agent, "Tell me about recursion in programming.")  # Provides synchronous execution with built-in durability
    return result.final_output

The Durable Task Scheduler dashboard showcasing the agent LLM call as a durable operation

Notice how little actually changed. We added app.durable_openai_agent_orchestrator decorator but your core agent logic stays the same. The run_sync* method provides execution with built-in durability, enabling your agents to automatically recover from failures with minimal code changes.

When using the Durable Task Scheduler as your Durable Functions backend, you gain access to a detailed monitoring dashboard that provides visibility into your agent executions. The dashboard displays detailed inputs and outputs for both LLM calls and tool invocations, along with clear success/failure indicators, making it straightforward to diagnose and troubleshoot any unexpected behavior in your agent processes.

A note about 'run_sync'

In Durable Functions, orchestrators don’t usually benefit from invoking code asynchronously because their role is to define the workflow—tracking state, scheduling activities, and so on—not to perform actual work. When you call an activity, the framework records the decision and suspends the orchestrator until the result is ready. For example, when you call run_sync, the deterministic part of the call completes almost instantly, and the LLM call activity is scheduled for asynchronous execution. Adding extra asynchronous code inside the orchestrator doesn’t improve performance; it only breaks determinism and complicates replay.

Reliable Tool Invocation Example

For agents requiring tool interactions, there are two implementation approaches. The first option uses the @function_tool decorator from the Open AI Agent SDK, which executes directly within the context of the durable orchestration. When using this approach, your tool functions must follow durable functions orchestration deterministic constraints. Additionally, since these functions run within the orchestration itself, they may be replayed as part of normal operations, making cost-conscious implementation necessary.

from agents import Agent, Runner, function_tool

class Weather(BaseModel):
    city: str
    temperature_range: str
    conditions: str

@function_tool
def get_weather(city: str) -> Weather:
    """Get the current weather information for a specified city."""
    print("[debug] get_weather called")
    return Weather(
        city=city, 
        temperature_range="14-20C", 
        conditions="Sunny with wind."
    )

@app.orchestration_trigger(context_name="context")
@app.durable_openai_agent_orchestrator
def tools(context):
    agent = Agent(
        name="Hello world",
        instructions="You are a helpful agent.",
        tools=[get_weather],
    )
    
    result = Runner.run_sync(agent, input="What's the weather in Tokyo?")
    return result.final_output

The second approach uses the create_activity_tool function, which is designed for non-deterministic code or scenarios where rerunning the tool is expensive (in terms of performance or cost). This approach executes the tool within the context of a durable orchestration activity, providing enhanced monitoring through the Durable Task Scheduler dashboard and ensuring that expensive operations are not unnecessarily repeated during orchestration replays.

from agents import Agent, Runner, function_tool

class Weather(BaseModel):
    city: str
    temperature_range: str
    conditions: str

@app.orchestration_trigger(context_name="context")
@app.durable_openai_agent_orchestrator
def weather_expert(context):
    agent = Agent(
        name="Hello world",
        instructions="You are a helpful agent.",
        tools=[
            context.create_activity_tool(get_weather)
        ],
    )
    
    result = Runner.run_sync(agent, "What is the weather in Tokio?")
    return result.final_output

@app.activity_trigger(input_name="city")
async def get_weather(city: str) -> Weather:
    weather = Weather(
        city=city, 
        temperature_range="14-20C", 
        conditions="Sunny with wind."
    )
    return weather

Leveraging Durable Functions Stateful App Patterns

Beyond basic durability of agents, this integration provides access to the full Durable Functions orchestration context, enabling developers to implement sophisticated stateful application patterns when needed, such as:

External Event Handling: Use context.wait_for_external_event() for human approvals, external system callbacks, or time-based triggers
Fan-out/Fan-in: Coordinate multiple tasks (including sub orchestrations invoking agents) in parallel.
Long-running Workflows: Implement workflows that span hours, days, or weeks with persistent state
Conditional Logic: Build dynamic agent workflows based on runtime decisions and external inputs

Human Interaction and Approval Workflows Example

For scenarios requiring human oversight, you can leverage the orchestration context to implement approval workflows:

.durable_openai_agent_orchestrator
def agent_with_approval(context):
    # Run initial agent analysis
    agent = Agent(name="DataAnalyzer", instructions="Analyze the provided dataset")
    initial_result = Runner.run_sync(agent, context.get_input())
    
    # Wait for human approval before proceeding
    approval_event = context.wait_for_external_event("approval_received")
    
    if approval_event.get("approved"):
        # Continue with next phase
        final_agent = Agent(name="Reporter", instructions="Generate final report")
        final_result = Runner.run_sync(final_agent, initial_result.final_output)
        return final_result.final_output
    else:
        return "Workflow cancelled by user"

This flexibility allows you to build sophisticated agentic applications that combine the power of AI agents with enterprise-grade workflow orchestration patterns, all while maintaining the familiar OpenAI Agents SDK experience.

Get Started Today

This article only scratches the surface of what's possible with the OpenAI Agent SDK integration for Durable Functions The combination of familiar OpenAI Agents SDK patterns with added reliability opens new possibilities for building sophisticated AI systems that can handle real-world production workloads.

The integration is designed for a smooth onboarding experience. Begin by selecting one of your existing agents and applying the transformation patterns demonstrated above (often requiring just a few lines of code changes).

Documentation: https://aka.ms/openai-agents-with-reliability-docs

Sample Applications: https://aka.ms/openai-agents-with-reliability-samples

Updated Sep 25, 2025

Version 1.0

Microsoft

Joined May 24, 2022

View Profile

Apps on Azure Blog