Microsoft Foundry Blog

18 MIN READ

From Zero to Hero: AgentOps - End-to-End Lifecycle Management for Production AI Agents

Microsoft

Jan 12, 2026

The shift from proof-of-concept AI agents to production-ready systems isn't just about better models—it's about building robust infrastructure that can develop, deploy, and maintain intelligent agents at enterprise scale. As organizations move beyond simple chatbots to agentic systems that plan, reason, and act autonomously, the need for comprehensive Agent LLMOps becomes critical.

This guide walks through the complete lifecycle for building production AI agents, from development through deployment to monitoring, with special focus on leveraging Azure AI Foundry's hosted agents infrastructure.

The Evolution: From Single-Turn Prompts to Agentic Workflows

Traditional AI applications operated on a simple request-response pattern. Modern AI agents, however, are fundamentally different. They maintain state across multiple interactions, orchestrate complex multi-step workflows, and dynamically adapt their approach based on intermediate results.

According to recent analysis, agentic workflows represent systems where language models and tools are orchestrated through a combination of predefined logic and dynamic decision-making. Unlike monolithic systems where a single model attempts everything, production agents break down complex tasks into specialized components that collaborate effectively.

The difference is profound. A simple customer service chatbot might answer questions from a knowledge base. An agentic customer service system, however, can search multiple data sources, escalate to specialized sub-agents for technical issues, draft response emails, schedule follow-up tasks, and learn from each interaction to improve future responses.

Stage 1: Development with any agentic framework

Why LangGraph for Agent Development?

LangGraph has emerged as a leading framework for building stateful, multi-agent applications. Unlike traditional chain-based approaches, LangGraph uses a graph-based architecture where each node represents a unit of work and edges define the workflow paths between them.

The key advantages include:

Explicit State Management: LangGraph maintains persistent state across nodes, making it straightforward to track conversation history, intermediate results, and decision points. This is critical for debugging complex agent behaviors.

Visual Workflow Design: The graph structure provides an intuitive way to visualize and understand agent logic. When an agent misbehaves, you can trace execution through the graph to identify where things went wrong.

Flexible Control Flows: LangGraph supports diverse orchestration patterns—single agent, multi-agent, hierarchical, sequential—all within one framework. You can start simple and evolve as requirements grow.

Built-in Memory: Agents automatically store conversation histories and maintain context over time, enabling rich personalized interactions across sessions.

Core LangGraph Components

Nodes: Individual units of logic or action. A node might call an AI model, query a database, invoke an external API, or perform data transformation. Each node is a Python function that receives the current state and returns updates.

Edges: Define the workflow paths between nodes. These can be conditional (routing based on the node's output) or unconditional (always proceeding to the next step).

State: The data structure passed between nodes and updated through reducers. Proper state design is crucial—it should contain all information needed for decision-making while remaining manageable in size.

Checkpoints: LangGraph's checkpointing mechanism saves state at each node, enabling features like human-in-the-loop approval, retry logic, and debugging.

Implementing the Agentic Workflow Pattern

A robust production agent typically follows a cyclical pattern of planning, execution, reflection, and adaptation:

Planning Phase: The agent analyzes the user's request and creates a structured plan, breaking complex problems into manageable steps.
Execution Phase: The agent carries out planned actions using appropriate tools—search engines, calculators, code interpreters, database queries, or API calls.
Reflection Phase: After each action, the agent evaluates results against expected outcomes. This critical thinking step determines whether to proceed, retry with a different approach, or seek additional information.
Decision Phase: Based on reflection, the agent decides the next course of action—continue to the next step, loop back to refine the approach, or conclude with a final response.

This pattern handles real-world complexity far better than simple linear workflows. When an agent encounters unexpected results, the reflection phase enables adaptive responses rather than brittle failure.

Example: Building a Research Agent with LangGraph

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List

class AgentState(TypedDict):
    query: str
    plan: List[str]
    current_step: int
    research_results: List[dict]
    final_answer: str

def planning_node(state: AgentState):
    # Agent creates a research plan
    llm = ChatOpenAI(model="gpt-4")
    plan = llm.invoke(f"Create a research plan for: {state['query']}")
    return {"plan": plan, "current_step": 0}

def research_node(state: AgentState):
    # Execute current research step
    step = state['plan'][state['current_step']]
    # Perform web search, database query, etc.
    results = perform_research(step)
    return {"research_results": state['research_results'] + [results]}

def reflection_node(state: AgentState):
    # Evaluate if we have enough information
    if len(state['research_results']) >= len(state['plan']):
        return {"next": "synthesize"}
    return {"next": "research", "current_step": state['current_step'] + 1}

def synthesize_node(state: AgentState):
    # Generate final answer from all research
    llm = ChatOpenAI(model="gpt-4")
    answer = llm.invoke(f"Synthesize research: {state['research_results']}")
    return {"final_answer": answer}

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("planning", planning_node)
workflow.add_node("research", research_node)
workflow.add_node("reflection", reflection_node)
workflow.add_node("synthesize", synthesize_node)

workflow.add_edge("planning", "research")
workflow.add_edge("research", "reflection")
workflow.add_conditional_edges(
    "reflection",
    lambda s: s["next"],
    {"research": "research", "synthesize": "synthesize"}
)
workflow.add_edge("synthesize", END)

agent = workflow.compile()

This pattern scales from simple workflows to complex multi-agent systems with dozens of specialized nodes.

Stage 2: CI/CD Pipeline for AI Agents

Traditional software CI/CD focuses on code quality, security, and deployment automation. Agent CI/CD must additionally handle model versioning, evaluation against behavioral benchmarks, and non-deterministic behavior.

Build Phase: Packaging Agent Dependencies

Unlike traditional applications, AI agents have unique packaging requirements:

Model artifacts: Fine-tuned models, embeddings, or model configurations
Vector databases: Pre-computed embeddings for knowledge retrieval
Tool configurations: API credentials, endpoint URLs, rate limits
Prompt templates: Versioned prompt engineering assets
Evaluation datasets: Test cases for agent behavior validation

Best practice is to containerize everything. Docker provides reproducibility across environments and simplifies dependency management:

FROM python:3.11-slim

WORKDIR /app

COPY . user_agent/
WORKDIR /app/user_agent

RUN if [ -f requirements.txt ]; then \
        pip install -r requirements.txt; \
    else \
        echo "No requirements.txt found"; \
    fi

EXPOSE 8088

CMD ["python", "main.py"]

Register Phase: Version Control Beyond Git

Code versioning is necessary but insufficient for AI agents. You need comprehensive artifact versioning:

Container Registry: Azure Container Registry stores Docker images with semantic versioning. Each agent version becomes an immutable artifact that can be deployed or rolled back at any time.

Prompt Registry: Version control your prompts separately from code. Prompt changes can dramatically impact agent behavior, so treating them as first-class artifacts enables A/B testing and rapid iteration.

Configuration Management: Store agent configurations (model selection, temperature, token limits, tool permissions) in version-controlled files. This ensures reproducibility and enables easy rollback.

Evaluate Phase: Testing Non-Deterministic Behavior

The biggest challenge in agent CI/CD is evaluation. Unlike traditional software where unit tests verify exact outputs, agents produce variable responses that must be evaluated holistically.

Behavioral Testing: Define test cases that specify desired agent behaviors rather than exact outputs. For example, "When asked about product pricing, the agent should query the pricing API, handle rate limits gracefully, and present information in a structured format."

Evaluation Metrics: Track multiple dimensions:

Task completion rate: Did the agent accomplish the goal?
Tool usage accuracy: Did it call the right tools with correct parameters?
Response quality: Measured via LLM-as-judge or human evaluation
Latency: Time to first token and total response time
Cost: Token usage and API call expenses

Adversarial Testing: Intentionally test edge cases—ambiguous requests, tool failures, rate limiting, conflicting information. Production agents will encounter these scenarios.

Recent research on CI/CD for AI agents emphasizes comprehensive instrumentation from day one. Track every input, output, API call, token usage, and decision point. After accumulating production data, patterns emerge showing which metrics actually predict failures versus noise.

Deploy Phase: Safe Production Rollouts

Never deploy agents directly to production. Implement progressive delivery:

Staging Environment: Deploy to a staging environment that mirrors production. Run automated tests and manual QA against real data (appropriately anonymized).

Canary Deployment: Route a small percentage of traffic (5-10%) to the new version. Monitor error rates, latency, user satisfaction, and cost metrics. Automatically rollback if any metric degrades beyond thresholds.

Blue-Green Deployment: Maintain two production environments. Deploy to the inactive environment, verify it's healthy, then switch traffic. Enables instant rollback by switching back.

Feature Flags: Deploy new agent capabilities behind feature flags. Gradually enable them for specific user segments, gather feedback, and iterate before full rollout.

Now since we know how to create an agent using langgraph, the next step will be understand how can we use this langgraph agent to deploy in Azure AI Foundry.

Stage 3: Azure AI Foundry Hosted Agents

Hosted agents are containerized agentic AI applications that run on Microsoft's Foundry Agent Service. They represent a paradigm shift from traditional prompt-based agents to fully code-driven, production-ready AI systems.

When to Use Hosted Agents:

✅ Complex agentic workflows - Multi-step reasoning, branching logic, conditional execution

✅ Custom tool integration - External APIs, databases, internal systems

✅ Framework-specific features - LangGraph graphs, multi-agent orchestration

✅ Production scale - Enterprise deployments requiring autoscaling

✅ Auth- Identity and authentication, Security and compliance controls

✅ CI/CD integration - Automated testing and deployment pipelines

Why Hosted Agents Matter

Hosted agents bridge the gap between experimental AI prototypes and production systems:

For Developers:

Full control over agent logic via code
Use familiar frameworks and tools
Local testing before deployment
Version control for agent code

For Enterprises:

No infrastructure management overhead
Built-in security and compliance
Scalable pay-as-you-go pricing
Integration with existing Azure ecosystem

For AI Systems:

Complex reasoning patterns beyond prompts
Stateful conversations with persistence
Custom tool integration and orchestration
Multi-agent collaboration

Before you get started with Foundry. Deploy Foundry project using the starter code using AZ CLI.

# Initialize a new agent project 
azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic 
# The template automatically provisions: 
# - Foundry resource and project 
# - Azure Container Registry 
# - Application Insights for monitoring 
# - Managed identities and RBAC 
# Deploy 
azd up

The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands.

Local Development to Production Workflow

A streamlined workflow bridges development and production:

Develop Locally: Build and test your LangGraph agent on your machine. Use the Foundry SDK to ensure compatibility with production APIs.
Validate Locally: Run the agent locally against the Foundry Responses API to verify it works with production authentication and conversation management.
Containerize: Package your agent in a Docker container with all dependencies.
Deploy to Staging: Use azd deploy to push to a staging Foundry project. Run automated tests.
Deploy to Production: Once validated, deploy to production. Foundry handles versioning, so you can maintain multiple agent versions and route traffic accordingly.
Monitor and Iterate: Use Application Insights to monitor agent performance, identify issues, and plan improvements.

Azure AI Toolkit for Visual Studio offer this great place to test your hosted agent.

You can also test this using REST.

FROM python:3.11-slim

WORKDIR /app

COPY . user_agent/
WORKDIR /app/user_agent

RUN if [ -f requirements.txt ]; then \
        pip install -r requirements.txt; \
    else \
        echo "No requirements.txt found"; \
    fi

EXPOSE 8088

CMD ["python", "main.py"]

Once you are able to run agent and test in local playground. You want to move to the next step of registering, evaluating and deploying agent in Microsoft AI Foundry.

CI/CD with GitHub Actions

This repository includes a GitHub Actions workflow (`.github/workflows/mslearnagent-AutoDeployTrigger.yml`) that automatically builds and deploys the agent to Azure when changes are pushed to the main branch.

1. Set Up Service Principal

# Create service principal
az ad sp create-for-rbac \
  --name "github-actions-agent-deploy" \
  --role contributor \
  --scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP

# Output will include:
# - appId (AZURE_CLIENT_ID)
# - tenant (AZURE_TENANT_ID)

2. Configure Federated Credentials

# For GitHub Actions OIDC
az ad app federated-credential create \
  --id $APP_ID \
  --parameters '{
    "name": "github-actions-deploy",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main",
    "audiences": ["api://AzureADTokenExchange"]
  }'

3. Set Required Permissions

Critical: Service principal needs Azure AI User role on AI Services resource:

# Get AI Services resource ID
AI_SERVICES_ID=$(az cognitiveservices account show \
  --name $AI_SERVICES_NAME \
  --resource-group $RESOURCE_GROUP \
  --query id -o tsv)

# Assign Azure AI User role
az role assignment create \
  --assignee $SERVICE_PRINCIPAL_ID \
  --role "Azure AI User" \
  --scope $AI_SERVICES_ID

4. Configure GitHub Secrets

Go to GitHub repository → Settings → Secrets and variables → Actions

Add the following secrets:

AZURE_CLIENT_ID=<from-service-principal>
AZURE_TENANT_ID=<from-service-principal>
AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_AI_PROJECT_ENDPOINT=<your-project-endpoint>
ACR_NAME=<your-acr-name>
IMAGE_NAME=calculator-agent
AGENT_NAME=CalculatorAgent

5. Create GitHub Actions Workflow

Create .github/workflows/deploy-agent.yml:

name: Deploy Agent to Azure AI Foundry

on:
  push:
    branches:
      - main
    paths:
      - 'main.py'
      - 'custom_state_converter.py'
      - 'requirements.txt'
      - 'Dockerfile'
  workflow_dispatch:
    inputs:
      version_tag:
        description: 'Version tag (leave empty for auto-increment)'
        required: false
        type: string

permissions:
  id-token: write
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Generate version tag
        id: version
        run: |
          if [ -n "${{ github.event.inputs.version_tag }}" ]; then
            echo "VERSION=${{ github.event.inputs.version_tag }}" >> $GITHUB_OUTPUT
          else
            # Auto-increment version
            VERSION="v$(date +%Y%m%d-%H%M%S)"
            echo "VERSION=$VERSION" >> $GITHUB_OUTPUT
          fi

      - name: Azure Login (OIDC)
        uses: azure/login@v1
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install Azure AI SDK
        run: |
          pip install azure-ai-projects azure-identity

      - name: Build and push Docker image
        run: |
          az acr build \
            --registry ${{ secrets.ACR_NAME }} \
            --image ${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }} \
            --image ${{ secrets.IMAGE_NAME }}:latest \
            --file Dockerfile \
            .

      - name: Register agent version
        env:
          AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }}
          ACR_NAME: ${{ secrets.ACR_NAME }}
          IMAGE_NAME: ${{ secrets.IMAGE_NAME }}
          AGENT_NAME: ${{ secrets.AGENT_NAME }}
          VERSION: ${{ steps.version.outputs.VERSION }}
        run: |
          python - <<EOF
          import os
          from azure.ai.projects import AIProjectClient
          from azure.identity import DefaultAzureCredential
          from azure.ai.projects.models import ImageBasedHostedAgentDefinition

          project_endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]
          credential = DefaultAzureCredential()
          project_client = AIProjectClient.from_connection_string(
              credential=credential,
              conn_str=project_endpoint,
          )

          agent_name = os.environ["AGENT_NAME"]
          version = os.environ["VERSION"]
          image_uri = f"{os.environ['ACR_NAME']}.azurecr.io/{os.environ['IMAGE_NAME']}:{version}"

          agent_definition = ImageBasedHostedAgentDefinition(
              image=image_uri,
              cpu=1.0,
              memory="2Gi",
          )

          agent = project_client.agents.create_or_update(
              agent_id=agent_name,
              body=agent_definition
          )

          print(f"Agent version registered: {agent.version}")
          EOF

      - name: Start agent
        run: |
          echo "Agent deployed successfully with version ${{ steps.version.outputs.VERSION }}"

      - name: Deployment summary
        run: |
          echo "### Deployment Summary" >> $GITHUB_STEP_SUMMARY
          echo "- **Agent Name**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Image**: ${{ secrets.ACR_NAME }}.azurecr.io/${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Status**: Deployed" >> $GITHUB_STEP_SUMMARY

6. Add Container Status Verification

To ensure deployments are truly successful, add a script to verify container startup before marking the pipeline as complete.

Create wait_for_container.py:

"""
Wait for agent container to be ready.

This script polls the agent container status until it's running successfully
or times out. Designed for use in CI/CD pipelines to verify deployment.
"""

import os
import sys
import time
import requests
from typing import Optional, Dict, Any
from azure.identity import DefaultAzureCredential


class ContainerStatusWaiter:
    """Polls agent container status until ready or timeout."""

    def __init__(
        self,
        project_endpoint: str,
        agent_name: str,
        agent_version: str,
        timeout_seconds: int = 600,
        poll_interval: int = 10,
    ):
        """
        Initialize the container status waiter.

        Args:
            project_endpoint: Azure AI Foundry project endpoint
            agent_name: Name of the agent
            agent_version: Version of the agent
            timeout_seconds: Maximum time to wait (default: 10 minutes)
            poll_interval: Seconds between status checks (default: 10s)
        """
        self.project_endpoint = project_endpoint.rstrip("/")
        self.agent_name = agent_name
        self.agent_version = agent_version
        self.timeout_seconds = timeout_seconds
        self.poll_interval = poll_interval
        self.api_version = "2025-11-15-preview"

        # Get Azure AD token
        credential = DefaultAzureCredential()
        token = credential.get_token("https://ml.azure.com/.default")
        self.headers = {
            "Authorization": f"Bearer {token.token}",
            "Content-Type": "application/json",
        }

    def _get_container_url(self) -> str:
        """Build the container status URL."""
        return (
            f"{self.project_endpoint}/agents/{self.agent_name}"
            f"/versions/{self.agent_version}/containers/default"
        )

    def get_container_status(self) -> Optional[Dict[str, Any]]:
        """Get current container status."""
        url = f"{self._get_container_url()}?api-version={self.api_version}"

        try:
            response = requests.get(url, headers=self.headers, timeout=30)

            if response.status_code == 200:
                return response.json()
            elif response.status_code == 404:
                return None
            else:
                print(f"⚠️  Warning: GET container returned {response.status_code}")
                return None

        except Exception as e:
            print(f"⚠️  Warning: Error getting container status: {e}")
            return None

    def wait_for_container_running(self) -> bool:
        """
        Wait for container to reach running state.

        Returns:
            True if container is running, False if timeout or error
        """
        print(f"\n🔍 Checking container status for {self.agent_name} v{self.agent_version}")
        print(f"⏱️  Timeout: {self.timeout_seconds}s | Poll interval: {self.poll_interval}s")
        print("-" * 70)

        start_time = time.time()
        iteration = 0

        while time.time() - start_time < self.timeout_seconds:
            iteration += 1
            elapsed = int(time.time() - start_time)

            container = self.get_container_status()

            if not container:
                print(f"[{iteration}] ({elapsed}s) ⏳ Container not found yet, waiting...")
                time.sleep(self.poll_interval)
                continue

            # Extract status information
            status = (
                container.get("status")
                or container.get("state")
                or container.get("provisioningState")
                or "Unknown"
            )

            # Check for replicas information
            replicas = container.get("replicas", {})
            ready_replicas = replicas.get("ready", 0)
            desired_replicas = replicas.get("desired", 0)

            print(f"[{iteration}] ({elapsed}s) 📊 Status: {status}")

            if replicas:
                print(f"              🔢 Replicas: {ready_replicas}/{desired_replicas} ready")

            # Check if container is running and ready
            if status.lower() in ["running", "succeeded", "ready"]:
                if desired_replicas == 0 or ready_replicas >= desired_replicas:
                    print("\n" + "=" * 70)
                    print("✅ Container is running and ready!")
                    print("=" * 70)
                    return True

            elif status.lower() in ["failed", "error", "cancelled"]:
                print("\n" + "=" * 70)
                print(f"❌ Container failed to start: {status}")
                print("=" * 70)
                return False

            time.sleep(self.poll_interval)

        # Timeout reached
        print("\n" + "=" * 70)
        print(f"⏱️  Timeout reached after {self.timeout_seconds}s")
        print("=" * 70)
        return False


def main():
    """Main entry point for CLI usage."""
    project_endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT")
    agent_name = os.getenv("AGENT_NAME")
    agent_version = os.getenv("AGENT_VERSION")
    timeout = int(os.getenv("TIMEOUT_SECONDS", "600"))
    poll_interval = int(os.getenv("POLL_INTERVAL_SECONDS", "10"))

    if not all([project_endpoint, agent_name, agent_version]):
        print("❌ Error: Missing required environment variables")
        sys.exit(1)

    waiter = ContainerStatusWaiter(
        project_endpoint=project_endpoint,
        agent_name=agent_name,
        agent_version=agent_version,
        timeout_seconds=timeout,
        poll_interval=poll_interval,
    )

    success = waiter.wait_for_container_running()
    sys.exit(0 if success else 1)


if __name__ == "__main__":
    main()

Key Features:

REST API Polling: Uses Azure AI Foundry REST API to check container status
Timeout Handling: Configurable timeout (default 10 minutes)
Progress Tracking: Shows iteration count and elapsed time
Replica Checking: Verifies all desired replicas are ready
Clear Output: Emoji-enhanced status messages for easy reading
Exit Codes: Returns 0 for success, 1 for failure (CI/CD friendly)

Update workflow to include verification:

Add this step after starting the agent:

      - name: Start the new agent version
        id: start_agent
        env:
          FOUNDRY_ACCOUNT: ${{ steps.foundry_details.outputs.FOUNDRY_ACCOUNT }}
          PROJECT_NAME: ${{ steps.foundry_details.outputs.PROJECT_NAME }}
          AGENT_NAME: ${{ secrets.AGENT_NAME }}
        run: |
          LATEST_VERSION=$(az cognitiveservices agent show \
            --account-name "$FOUNDRY_ACCOUNT" \
            --project-name "$PROJECT_NAME" \
            --name "$AGENT_NAME" \
            --query "versions.latest.version" -o tsv)

          echo "AGENT_VERSION=$LATEST_VERSION" >> $GITHUB_OUTPUT

          az cognitiveservices agent start \
            --account-name "$FOUNDRY_ACCOUNT" \
            --project-name "$PROJECT_NAME" \
            --name "$AGENT_NAME" \
            --agent-version $LATEST_VERSION

      - name: Wait for container to be ready
        env:
          AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }}
          AGENT_NAME: ${{ secrets.AGENT_NAME }}
          AGENT_VERSION: ${{ steps.start_agent.outputs.AGENT_VERSION }}
          TIMEOUT_SECONDS: 600
          POLL_INTERVAL_SECONDS: 15
        run: |
          echo "⏳ Waiting for container to be ready..."
          python wait_for_container.py

      - name: Deployment Summary
        if: success()
        run: |
          echo "## Deployment Complete! 🚀" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "- **Agent**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Status**: ✅ Container running and ready" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "### Deployment Timeline" >> $GITHUB_STEP_SUMMARY
          echo "1. ✅ Image built and pushed to ACR" >> $GITHUB_STEP_SUMMARY
          echo "2. ✅ Agent version registered" >> $GITHUB_STEP_SUMMARY
          echo "3. ✅ Container started" >> $GITHUB_STEP_SUMMARY
          echo "4. ✅ Container verified as running" >> $GITHUB_STEP_SUMMARY

      - name: Deployment Failed Summary
        if: failure()
        run: |
          echo "## Deployment Failed ❌" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "Please check the logs above for error details." >> $GITHUB_STEP_SUMMARY

Benefits of Container Status Verification:

Deployment Confidence: Know for certain that the container started successfully
Early Failure Detection: Catch startup errors before users are affected
CI/CD Gate: Pipeline only succeeds when container is actually ready
Debugging Aid: Clear logs show container startup progress
Timeout Protection: Prevents infinite waits with configurable timeout

REST API Endpoints Used:

GET {endpoint}/agents/{agent_name}/versions/{agent_version}/containers/default?api-version=2025-11-15-preview

Response includes:

status or state: Container state (Running, Failed, etc.)
replicas.ready: Number of ready replicas
replicas.desired: Target number of replicas
error: Error details if failed

Container States:

Running/Ready: Container is operational
InProgress: Container is starting up
Failed/Error: Container failed to start
Stopped: Container was stopped

7. Trigger Deployment

# Automatic trigger - push to main
git add .
git commit -m "Update agent implementation"
git push origin main

# Manual trigger - via GitHub UI
# Go to Actions → Deploy Agent to Azure AI Foundry → Run workflow

Now this will trigger the Workflow as soon as you checkin the implementation code.

You can play with the Agent in Foundry UI:

Evaluation is now part the workflow

You can also visualize the Evaluation in AI Foundry

Best Practices for Production Agent LLMOps

1. Start with Simple Workflows, Add Complexity Gradually

Don't build a complex multi-agent system on day one. Start with a single agent that does one task well. Once that's stable in production, add additional capabilities:

Single agent with basic tool calling
Add memory/state for multi-turn conversations
Introduce specialized sub-agents for complex tasks
Implement multi-agent collaboration

This incremental approach reduces risk and enables learning from real usage before investing in advanced features.

2. Instrument Everything from Day One

The worst time to add observability is after you have a production incident. Comprehensive instrumentation should be part of your initial development:

Log every LLM call with inputs, outputs, token usage
Track all tool invocations
Record decision points in agent reasoning
Capture timing metrics for every operation
Log errors with full context

After accumulating production data, you'll identify which metrics matter most. But you can't retroactively add logging for incidents that already occurred.

3. Build Evaluation into the Development Process

Don't wait until deployment to evaluate agent quality. Integrate evaluation throughout development:

Maintain a growing set of test conversations
Run evaluations on every code change
Track metrics over time to identify regressions
Include diverse scenarios—happy path, edge cases, adversarial inputs

Use LLM-as-judge for scalable automated evaluation, supplemented with periodic human review of sample outputs.

4. Embrace Non-Determinism, But Set Boundaries

Agents are inherently non-deterministic, but that doesn't mean anything goes:

Set acceptable ranges for variability in testing
Use temperature and sampling controls to manage randomness
Implement retry logic with exponential backoff
Add fallback behaviors for when primary approaches fail
Use assertions to verify critical invariants (e.g., "agent must never perform destructive actions without confirmation")

5. Prioritize Security and Governance from Day One

Security shouldn't be an afterthought:

Use managed identities and RBAC for all resource access
Implement least-privilege principles—agents get only necessary permissions
Add content filtering for inputs and outputs
Monitor for prompt injection and jailbreak attempts
Maintain audit logs for compliance
Regularly review and update security policies

6. Design for Failure

Your agents will fail. Design systems that degrade gracefully:

Implement retry logic for transient failures
Provide clear error messages to users
Include fallback behaviors (e.g., escalate to human support)
Never leave users stuck—always provide a path forward
Log failures with full context for post-incident analysis

7. Balance Automation with Human Oversight

Fully autonomous agents are powerful but risky. Consider human-in-the-loop workflows for high-stakes decisions:

Draft responses that require approval before sending
Request confirmation before executing destructive actions
Escalate ambiguous situations to human operators
Provide clear audit trails of agent actions

8. Manage Costs Proactively

LLM API costs can escalate quickly at scale:

Monitor token usage per conversation
Set per-conversation token limits
Use caching for repeated queries
Choose appropriate models (not always the largest)
Consider local models for suitable use cases
Alert on cost anomalies that indicate runaway loops

9. Plan for Continuous Learning

Agents should improve over time:

Collect feedback on agent responses (thumbs up/down)
Analyze conversations that required escalation
Identify common failure patterns
Fine-tune models on production interaction data (with appropriate consent)
Iterate on prompts based on real usage
Share learnings across the team

10. Document Everything

Comprehensive documentation is critical as teams scale:

Agent architecture and design decisions
Tool configurations and API contracts
Deployment procedures and runbooks
Incident response procedures
Version migration guides
Evaluation methodologies

Conclusion

You now have a complete, production-ready AI agent deployed to Azure AI Foundry with:

LangGraph-based agent orchestration
Tool-calling capabilities
Multi-turn conversation support
Containerized deployment
CI/CD automation
Evaluation framework
Multiple client implementations

Key Takeaway

LangGraph provides flexible agent orchestration with state management
Azure AI Agent Server SDK simplifies deployment to Azure AI Foundry
Custom state converter is critical for production deployments with tool calls
CI/CD automation enables rapid iteration and deployment
Evaluation framework ensures agent quality and performance