The shift from proof-of-concept AI agents to production-ready systems isn't just about better models—it's about building robust infrastructure that can develop, deploy, and maintain intelligent agents at enterprise scale. As organizations move beyond simple chatbots to agentic systems that plan, reason, and act autonomously, the need for comprehensive Agent LLMOps becomes critical.
This guide walks through the complete lifecycle for building production AI agents, from development through deployment to monitoring, with special focus on leveraging Azure AI Foundry's hosted agents infrastructure.
The Evolution: From Single-Turn Prompts to Agentic Workflows
Traditional AI applications operated on a simple request-response pattern. Modern AI agents, however, are fundamentally different. They maintain state across multiple interactions, orchestrate complex multi-step workflows, and dynamically adapt their approach based on intermediate results.
According to recent analysis, agentic workflows represent systems where language models and tools are orchestrated through a combination of predefined logic and dynamic decision-making. Unlike monolithic systems where a single model attempts everything, production agents break down complex tasks into specialized components that collaborate effectively.
The difference is profound. A simple customer service chatbot might answer questions from a knowledge base. An agentic customer service system, however, can search multiple data sources, escalate to specialized sub-agents for technical issues, draft response emails, schedule follow-up tasks, and learn from each interaction to improve future responses.
Stage 1: Development with any agentic framework
Why LangGraph for Agent Development?
LangGraph has emerged as a leading framework for building stateful, multi-agent applications. Unlike traditional chain-based approaches, LangGraph uses a graph-based architecture where each node represents a unit of work and edges define the workflow paths between them.
The key advantages include:
Explicit State Management: LangGraph maintains persistent state across nodes, making it straightforward to track conversation history, intermediate results, and decision points. This is critical for debugging complex agent behaviors.
Visual Workflow Design: The graph structure provides an intuitive way to visualize and understand agent logic. When an agent misbehaves, you can trace execution through the graph to identify where things went wrong.
Flexible Control Flows: LangGraph supports diverse orchestration patterns—single agent, multi-agent, hierarchical, sequential—all within one framework. You can start simple and evolve as requirements grow.
Built-in Memory: Agents automatically store conversation histories and maintain context over time, enabling rich personalized interactions across sessions.
Core LangGraph Components
Nodes: Individual units of logic or action. A node might call an AI model, query a database, invoke an external API, or perform data transformation. Each node is a Python function that receives the current state and returns updates.
Edges: Define the workflow paths between nodes. These can be conditional (routing based on the node's output) or unconditional (always proceeding to the next step).
State: The data structure passed between nodes and updated through reducers. Proper state design is crucial—it should contain all information needed for decision-making while remaining manageable in size.
Checkpoints: LangGraph's checkpointing mechanism saves state at each node, enabling features like human-in-the-loop approval, retry logic, and debugging.
Implementing the Agentic Workflow Pattern
A robust production agent typically follows a cyclical pattern of planning, execution, reflection, and adaptation:
- Planning Phase: The agent analyzes the user's request and creates a structured plan, breaking complex problems into manageable steps.
- Execution Phase: The agent carries out planned actions using appropriate tools—search engines, calculators, code interpreters, database queries, or API calls.
- Reflection Phase: After each action, the agent evaluates results against expected outcomes. This critical thinking step determines whether to proceed, retry with a different approach, or seek additional information.
- Decision Phase: Based on reflection, the agent decides the next course of action—continue to the next step, loop back to refine the approach, or conclude with a final response.
This pattern handles real-world complexity far better than simple linear workflows. When an agent encounters unexpected results, the reflection phase enables adaptive responses rather than brittle failure.
Example: Building a Research Agent with LangGraph
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List
class AgentState(TypedDict):
query: str
plan: List[str]
current_step: int
research_results: List[dict]
final_answer: str
def planning_node(state: AgentState):
# Agent creates a research plan
llm = ChatOpenAI(model="gpt-4")
plan = llm.invoke(f"Create a research plan for: {state['query']}")
return {"plan": plan, "current_step": 0}
def research_node(state: AgentState):
# Execute current research step
step = state['plan'][state['current_step']]
# Perform web search, database query, etc.
results = perform_research(step)
return {"research_results": state['research_results'] + [results]}
def reflection_node(state: AgentState):
# Evaluate if we have enough information
if len(state['research_results']) >= len(state['plan']):
return {"next": "synthesize"}
return {"next": "research", "current_step": state['current_step'] + 1}
def synthesize_node(state: AgentState):
# Generate final answer from all research
llm = ChatOpenAI(model="gpt-4")
answer = llm.invoke(f"Synthesize research: {state['research_results']}")
return {"final_answer": answer}
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("planning", planning_node)
workflow.add_node("research", research_node)
workflow.add_node("reflection", reflection_node)
workflow.add_node("synthesize", synthesize_node)
workflow.add_edge("planning", "research")
workflow.add_edge("research", "reflection")
workflow.add_conditional_edges(
"reflection",
lambda s: s["next"],
{"research": "research", "synthesize": "synthesize"}
)
workflow.add_edge("synthesize", END)
agent = workflow.compile()
This pattern scales from simple workflows to complex multi-agent systems with dozens of specialized nodes.
Stage 2: CI/CD Pipeline for AI Agents
Traditional software CI/CD focuses on code quality, security, and deployment automation. Agent CI/CD must additionally handle model versioning, evaluation against behavioral benchmarks, and non-deterministic behavior.
Build Phase: Packaging Agent Dependencies
Unlike traditional applications, AI agents have unique packaging requirements:
- Model artifacts: Fine-tuned models, embeddings, or model configurations
- Vector databases: Pre-computed embeddings for knowledge retrieval
- Tool configurations: API credentials, endpoint URLs, rate limits
- Prompt templates: Versioned prompt engineering assets
- Evaluation datasets: Test cases for agent behavior validation
Best practice is to containerize everything. Docker provides reproducibility across environments and simplifies dependency management:
FROM python:3.11-slim
WORKDIR /app
COPY . user_agent/
WORKDIR /app/user_agent
RUN if [ -f requirements.txt ]; then \
pip install -r requirements.txt; \
else \
echo "No requirements.txt found"; \
fi
EXPOSE 8088
CMD ["python", "main.py"]
Register Phase: Version Control Beyond Git
Code versioning is necessary but insufficient for AI agents. You need comprehensive artifact versioning:
Container Registry: Azure Container Registry stores Docker images with semantic versioning. Each agent version becomes an immutable artifact that can be deployed or rolled back at any time.
Prompt Registry: Version control your prompts separately from code. Prompt changes can dramatically impact agent behavior, so treating them as first-class artifacts enables A/B testing and rapid iteration.
Configuration Management: Store agent configurations (model selection, temperature, token limits, tool permissions) in version-controlled files. This ensures reproducibility and enables easy rollback.
Evaluate Phase: Testing Non-Deterministic Behavior
The biggest challenge in agent CI/CD is evaluation. Unlike traditional software where unit tests verify exact outputs, agents produce variable responses that must be evaluated holistically.
Behavioral Testing: Define test cases that specify desired agent behaviors rather than exact outputs. For example, "When asked about product pricing, the agent should query the pricing API, handle rate limits gracefully, and present information in a structured format."
Evaluation Metrics: Track multiple dimensions:
- Task completion rate: Did the agent accomplish the goal?
- Tool usage accuracy: Did it call the right tools with correct parameters?
- Response quality: Measured via LLM-as-judge or human evaluation
- Latency: Time to first token and total response time
- Cost: Token usage and API call expenses
Adversarial Testing: Intentionally test edge cases—ambiguous requests, tool failures, rate limiting, conflicting information. Production agents will encounter these scenarios.
Recent research on CI/CD for AI agents emphasizes comprehensive instrumentation from day one. Track every input, output, API call, token usage, and decision point. After accumulating production data, patterns emerge showing which metrics actually predict failures versus noise.
Deploy Phase: Safe Production Rollouts
Never deploy agents directly to production. Implement progressive delivery:
Staging Environment: Deploy to a staging environment that mirrors production. Run automated tests and manual QA against real data (appropriately anonymized).
Canary Deployment: Route a small percentage of traffic (5-10%) to the new version. Monitor error rates, latency, user satisfaction, and cost metrics. Automatically rollback if any metric degrades beyond thresholds.
Blue-Green Deployment: Maintain two production environments. Deploy to the inactive environment, verify it's healthy, then switch traffic. Enables instant rollback by switching back.
Feature Flags: Deploy new agent capabilities behind feature flags. Gradually enable them for specific user segments, gather feedback, and iterate before full rollout.
Now since we know how to create an agent using langgraph, the next step will be understand how can we use this langgraph agent to deploy in Azure AI Foundry.
Stage 3: Azure AI Foundry Hosted Agents
Hosted agents are containerized agentic AI applications that run on Microsoft's Foundry Agent Service. They represent a paradigm shift from traditional prompt-based agents to fully code-driven, production-ready AI systems.
When to Use Hosted Agents:
✅ Complex agentic workflows - Multi-step reasoning, branching logic, conditional execution
✅ Custom tool integration - External APIs, databases, internal systems
✅ Framework-specific features - LangGraph graphs, multi-agent orchestration
✅ Production scale - Enterprise deployments requiring autoscaling
✅ Auth- Identity and authentication, Security and compliance controls
✅ CI/CD integration - Automated testing and deployment pipelines
Why Hosted Agents Matter
Hosted agents bridge the gap between experimental AI prototypes and production systems:
For Developers:
- Full control over agent logic via code
- Use familiar frameworks and tools
- Local testing before deployment
- Version control for agent code
For Enterprises:
- No infrastructure management overhead
- Built-in security and compliance
- Scalable pay-as-you-go pricing
- Integration with existing Azure ecosystem
For AI Systems:
- Complex reasoning patterns beyond prompts
- Stateful conversations with persistence
- Custom tool integration and orchestration
- Multi-agent collaboration
Before you get started with Foundry. Deploy Foundry project using the starter code using AZ CLI.
# Initialize a new agent project
azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic
# The template automatically provisions:
# - Foundry resource and project
# - Azure Container Registry
# - Application Insights for monitoring
# - Managed identities and RBAC
# Deploy
azd up
The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands.
The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands.
Local Development to Production Workflow
A streamlined workflow bridges development and production:
- Develop Locally: Build and test your LangGraph agent on your machine. Use the Foundry SDK to ensure compatibility with production APIs.
- Validate Locally: Run the agent locally against the Foundry Responses API to verify it works with production authentication and conversation management.
- Containerize: Package your agent in a Docker container with all dependencies.
- Deploy to Staging: Use azd deploy to push to a staging Foundry project. Run automated tests.
- Deploy to Production: Once validated, deploy to production. Foundry handles versioning, so you can maintain multiple agent versions and route traffic accordingly.
- Monitor and Iterate: Use Application Insights to monitor agent performance, identify issues, and plan improvements.
Azure AI Toolkit for Visual Studio offer this great place to test your hosted agent.
You can also test this using REST.
FROM python:3.11-slim
WORKDIR /app
COPY . user_agent/
WORKDIR /app/user_agent
RUN if [ -f requirements.txt ]; then \
pip install -r requirements.txt; \
else \
echo "No requirements.txt found"; \
fi
EXPOSE 8088
CMD ["python", "main.py"]
Once you are able to run agent and test in local playground. You want to move to the next step of registering, evaluating and deploying agent in Microsoft AI Foundry.
CI/CD with GitHub Actions
This repository includes a GitHub Actions workflow (`.github/workflows/mslearnagent-AutoDeployTrigger.yml`) that automatically builds and deploys the agent to Azure when changes are pushed to the main branch.
1. Set Up Service Principal
# Create service principal
az ad sp create-for-rbac \
--name "github-actions-agent-deploy" \
--role contributor \
--scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP
# Output will include:
# - appId (AZURE_CLIENT_ID)
# - tenant (AZURE_TENANT_ID)
2. Configure Federated Credentials
# For GitHub Actions OIDC
az ad app federated-credential create \
--id $APP_ID \
--parameters '{
"name": "github-actions-deploy",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main",
"audiences": ["api://AzureADTokenExchange"]
}'
3. Set Required Permissions
Critical: Service principal needs Azure AI User role on AI Services resource:
# Get AI Services resource ID
AI_SERVICES_ID=$(az cognitiveservices account show \
--name $AI_SERVICES_NAME \
--resource-group $RESOURCE_GROUP \
--query id -o tsv)
# Assign Azure AI User role
az role assignment create \
--assignee $SERVICE_PRINCIPAL_ID \
--role "Azure AI User" \
--scope $AI_SERVICES_ID
4. Configure GitHub Secrets
Go to GitHub repository → Settings → Secrets and variables → Actions
Add the following secrets:
AZURE_CLIENT_ID=<from-service-principal>
AZURE_TENANT_ID=<from-service-principal>
AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_AI_PROJECT_ENDPOINT=<your-project-endpoint>
ACR_NAME=<your-acr-name>
IMAGE_NAME=calculator-agent
AGENT_NAME=CalculatorAgent
5. Create GitHub Actions Workflow
Create .github/workflows/deploy-agent.yml:
name: Deploy Agent to Azure AI Foundry
on:
push:
branches:
- main
paths:
- 'main.py'
- 'custom_state_converter.py'
- 'requirements.txt'
- 'Dockerfile'
workflow_dispatch:
inputs:
version_tag:
description: 'Version tag (leave empty for auto-increment)'
required: false
type: string
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Generate version tag
id: version
run: |
if [ -n "${{ github.event.inputs.version_tag }}" ]; then
echo "VERSION=${{ github.event.inputs.version_tag }}" >> $GITHUB_OUTPUT
else
# Auto-increment version
VERSION="v$(date +%Y%m%d-%H%M%S)"
echo "VERSION=$VERSION" >> $GITHUB_OUTPUT
fi
- name: Azure Login (OIDC)
uses: azure/login@v1
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install Azure AI SDK
run: |
pip install azure-ai-projects azure-identity
- name: Build and push Docker image
run: |
az acr build \
--registry ${{ secrets.ACR_NAME }} \
--image ${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }} \
--image ${{ secrets.IMAGE_NAME }}:latest \
--file Dockerfile \
.
- name: Register agent version
env:
AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }}
ACR_NAME: ${{ secrets.ACR_NAME }}
IMAGE_NAME: ${{ secrets.IMAGE_NAME }}
AGENT_NAME: ${{ secrets.AGENT_NAME }}
VERSION: ${{ steps.version.outputs.VERSION }}
run: |
python - <<EOF
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.projects.models import ImageBasedHostedAgentDefinition
project_endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]
credential = DefaultAzureCredential()
project_client = AIProjectClient.from_connection_string(
credential=credential,
conn_str=project_endpoint,
)
agent_name = os.environ["AGENT_NAME"]
version = os.environ["VERSION"]
image_uri = f"{os.environ['ACR_NAME']}.azurecr.io/{os.environ['IMAGE_NAME']}:{version}"
agent_definition = ImageBasedHostedAgentDefinition(
image=image_uri,
cpu=1.0,
memory="2Gi",
)
agent = project_client.agents.create_or_update(
agent_id=agent_name,
body=agent_definition
)
print(f"Agent version registered: {agent.version}")
EOF
- name: Start agent
run: |
echo "Agent deployed successfully with version ${{ steps.version.outputs.VERSION }}"
- name: Deployment summary
run: |
echo "### Deployment Summary" >> $GITHUB_STEP_SUMMARY
echo "- **Agent Name**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY
echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
echo "- **Image**: ${{ secrets.ACR_NAME }}.azurecr.io/${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
echo "- **Status**: Deployed" >> $GITHUB_STEP_SUMMARY
6. Add Container Status Verification
To ensure deployments are truly successful, add a script to verify container startup before marking the pipeline as complete.
Create wait_for_container.py:
"""
Wait for agent container to be ready.
This script polls the agent container status until it's running successfully
or times out. Designed for use in CI/CD pipelines to verify deployment.
"""
import os
import sys
import time
import requests
from typing import Optional, Dict, Any
from azure.identity import DefaultAzureCredential
class ContainerStatusWaiter:
"""Polls agent container status until ready or timeout."""
def __init__(
self,
project_endpoint: str,
agent_name: str,
agent_version: str,
timeout_seconds: int = 600,
poll_interval: int = 10,
):
"""
Initialize the container status waiter.
Args:
project_endpoint: Azure AI Foundry project endpoint
agent_name: Name of the agent
agent_version: Version of the agent
timeout_seconds: Maximum time to wait (default: 10 minutes)
poll_interval: Seconds between status checks (default: 10s)
"""
self.project_endpoint = project_endpoint.rstrip("/")
self.agent_name = agent_name
self.agent_version = agent_version
self.timeout_seconds = timeout_seconds
self.poll_interval = poll_interval
self.api_version = "2025-11-15-preview"
# Get Azure AD token
credential = DefaultAzureCredential()
token = credential.get_token("https://ml.azure.com/.default")
self.headers = {
"Authorization": f"Bearer {token.token}",
"Content-Type": "application/json",
}
def _get_container_url(self) -> str:
"""Build the container status URL."""
return (
f"{self.project_endpoint}/agents/{self.agent_name}"
f"/versions/{self.agent_version}/containers/default"
)
def get_container_status(self) -> Optional[Dict[str, Any]]:
"""Get current container status."""
url = f"{self._get_container_url()}?api-version={self.api_version}"
try:
response = requests.get(url, headers=self.headers, timeout=30)
if response.status_code == 200:
return response.json()
elif response.status_code == 404:
return None
else:
print(f"⚠️ Warning: GET container returned {response.status_code}")
return None
except Exception as e:
print(f"⚠️ Warning: Error getting container status: {e}")
return None
def wait_for_container_running(self) -> bool:
"""
Wait for container to reach running state.
Returns:
True if container is running, False if timeout or error
"""
print(f"\n🔍 Checking container status for {self.agent_name} v{self.agent_version}")
print(f"⏱️ Timeout: {self.timeout_seconds}s | Poll interval: {self.poll_interval}s")
print("-" * 70)
start_time = time.time()
iteration = 0
while time.time() - start_time < self.timeout_seconds:
iteration += 1
elapsed = int(time.time() - start_time)
container = self.get_container_status()
if not container:
print(f"[{iteration}] ({elapsed}s) ⏳ Container not found yet, waiting...")
time.sleep(self.poll_interval)
continue
# Extract status information
status = (
container.get("status")
or container.get("state")
or container.get("provisioningState")
or "Unknown"
)
# Check for replicas information
replicas = container.get("replicas", {})
ready_replicas = replicas.get("ready", 0)
desired_replicas = replicas.get("desired", 0)
print(f"[{iteration}] ({elapsed}s) 📊 Status: {status}")
if replicas:
print(f" 🔢 Replicas: {ready_replicas}/{desired_replicas} ready")
# Check if container is running and ready
if status.lower() in ["running", "succeeded", "ready"]:
if desired_replicas == 0 or ready_replicas >= desired_replicas:
print("\n" + "=" * 70)
print("✅ Container is running and ready!")
print("=" * 70)
return True
elif status.lower() in ["failed", "error", "cancelled"]:
print("\n" + "=" * 70)
print(f"❌ Container failed to start: {status}")
print("=" * 70)
return False
time.sleep(self.poll_interval)
# Timeout reached
print("\n" + "=" * 70)
print(f"⏱️ Timeout reached after {self.timeout_seconds}s")
print("=" * 70)
return False
def main():
"""Main entry point for CLI usage."""
project_endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT")
agent_name = os.getenv("AGENT_NAME")
agent_version = os.getenv("AGENT_VERSION")
timeout = int(os.getenv("TIMEOUT_SECONDS", "600"))
poll_interval = int(os.getenv("POLL_INTERVAL_SECONDS", "10"))
if not all([project_endpoint, agent_name, agent_version]):
print("❌ Error: Missing required environment variables")
sys.exit(1)
waiter = ContainerStatusWaiter(
project_endpoint=project_endpoint,
agent_name=agent_name,
agent_version=agent_version,
timeout_seconds=timeout,
poll_interval=poll_interval,
)
success = waiter.wait_for_container_running()
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()
Key Features:
- REST API Polling: Uses Azure AI Foundry REST API to check container status
- Timeout Handling: Configurable timeout (default 10 minutes)
- Progress Tracking: Shows iteration count and elapsed time
- Replica Checking: Verifies all desired replicas are ready
- Clear Output: Emoji-enhanced status messages for easy reading
- Exit Codes: Returns 0 for success, 1 for failure (CI/CD friendly)
Update workflow to include verification:
Add this step after starting the agent:
- name: Start the new agent version
id: start_agent
env:
FOUNDRY_ACCOUNT: ${{ steps.foundry_details.outputs.FOUNDRY_ACCOUNT }}
PROJECT_NAME: ${{ steps.foundry_details.outputs.PROJECT_NAME }}
AGENT_NAME: ${{ secrets.AGENT_NAME }}
run: |
LATEST_VERSION=$(az cognitiveservices agent show \
--account-name "$FOUNDRY_ACCOUNT" \
--project-name "$PROJECT_NAME" \
--name "$AGENT_NAME" \
--query "versions.latest.version" -o tsv)
echo "AGENT_VERSION=$LATEST_VERSION" >> $GITHUB_OUTPUT
az cognitiveservices agent start \
--account-name "$FOUNDRY_ACCOUNT" \
--project-name "$PROJECT_NAME" \
--name "$AGENT_NAME" \
--agent-version $LATEST_VERSION
- name: Wait for container to be ready
env:
AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }}
AGENT_NAME: ${{ secrets.AGENT_NAME }}
AGENT_VERSION: ${{ steps.start_agent.outputs.AGENT_VERSION }}
TIMEOUT_SECONDS: 600
POLL_INTERVAL_SECONDS: 15
run: |
echo "⏳ Waiting for container to be ready..."
python wait_for_container.py
- name: Deployment Summary
if: success()
run: |
echo "## Deployment Complete! 🚀" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "- **Agent**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY
echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
echo "- **Status**: ✅ Container running and ready" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "### Deployment Timeline" >> $GITHUB_STEP_SUMMARY
echo "1. ✅ Image built and pushed to ACR" >> $GITHUB_STEP_SUMMARY
echo "2. ✅ Agent version registered" >> $GITHUB_STEP_SUMMARY
echo "3. ✅ Container started" >> $GITHUB_STEP_SUMMARY
echo "4. ✅ Container verified as running" >> $GITHUB_STEP_SUMMARY
- name: Deployment Failed Summary
if: failure()
run: |
echo "## Deployment Failed ❌" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "Please check the logs above for error details." >> $GITHUB_STEP_SUMMARY
Benefits of Container Status Verification:
- Deployment Confidence: Know for certain that the container started successfully
- Early Failure Detection: Catch startup errors before users are affected
- CI/CD Gate: Pipeline only succeeds when container is actually ready
- Debugging Aid: Clear logs show container startup progress
- Timeout Protection: Prevents infinite waits with configurable timeout
REST API Endpoints Used:
GET {endpoint}/agents/{agent_name}/versions/{agent_version}/containers/default?api-version=2025-11-15-preview
Response includes:
- status or state: Container state (Running, Failed, etc.)
- replicas.ready: Number of ready replicas
- replicas.desired: Target number of replicas
- error: Error details if failed
Container States:
- Running/Ready: Container is operational
- InProgress: Container is starting up
- Failed/Error: Container failed to start
- Stopped: Container was stopped
7. Trigger Deployment
# Automatic trigger - push to main
git add .
git commit -m "Update agent implementation"
git push origin main
# Manual trigger - via GitHub UI
# Go to Actions → Deploy Agent to Azure AI Foundry → Run workflow
Now this will trigger the Workflow as soon as you checkin the implementation code.
You can play with the Agent in Foundry UI:
Evaluation is now part the workflow
You can also visualize the Evaluation in AI Foundry
Best Practices for Production Agent LLMOps
1. Start with Simple Workflows, Add Complexity Gradually
Don't build a complex multi-agent system on day one. Start with a single agent that does one task well. Once that's stable in production, add additional capabilities:
- Single agent with basic tool calling
- Add memory/state for multi-turn conversations
- Introduce specialized sub-agents for complex tasks
- Implement multi-agent collaboration
This incremental approach reduces risk and enables learning from real usage before investing in advanced features.
2. Instrument Everything from Day One
The worst time to add observability is after you have a production incident. Comprehensive instrumentation should be part of your initial development:
- Log every LLM call with inputs, outputs, token usage
- Track all tool invocations
- Record decision points in agent reasoning
- Capture timing metrics for every operation
- Log errors with full context
After accumulating production data, you'll identify which metrics matter most. But you can't retroactively add logging for incidents that already occurred.
3. Build Evaluation into the Development Process
Don't wait until deployment to evaluate agent quality. Integrate evaluation throughout development:
- Maintain a growing set of test conversations
- Run evaluations on every code change
- Track metrics over time to identify regressions
- Include diverse scenarios—happy path, edge cases, adversarial inputs
Use LLM-as-judge for scalable automated evaluation, supplemented with periodic human review of sample outputs.
4. Embrace Non-Determinism, But Set Boundaries
Agents are inherently non-deterministic, but that doesn't mean anything goes:
- Set acceptable ranges for variability in testing
- Use temperature and sampling controls to manage randomness
- Implement retry logic with exponential backoff
- Add fallback behaviors for when primary approaches fail
- Use assertions to verify critical invariants (e.g., "agent must never perform destructive actions without confirmation")
5. Prioritize Security and Governance from Day One
Security shouldn't be an afterthought:
- Use managed identities and RBAC for all resource access
- Implement least-privilege principles—agents get only necessary permissions
- Add content filtering for inputs and outputs
- Monitor for prompt injection and jailbreak attempts
- Maintain audit logs for compliance
- Regularly review and update security policies
6. Design for Failure
Your agents will fail. Design systems that degrade gracefully:
- Implement retry logic for transient failures
- Provide clear error messages to users
- Include fallback behaviors (e.g., escalate to human support)
- Never leave users stuck—always provide a path forward
- Log failures with full context for post-incident analysis
7. Balance Automation with Human Oversight
Fully autonomous agents are powerful but risky. Consider human-in-the-loop workflows for high-stakes decisions:
- Draft responses that require approval before sending
- Request confirmation before executing destructive actions
- Escalate ambiguous situations to human operators
- Provide clear audit trails of agent actions
8. Manage Costs Proactively
LLM API costs can escalate quickly at scale:
- Monitor token usage per conversation
- Set per-conversation token limits
- Use caching for repeated queries
- Choose appropriate models (not always the largest)
- Consider local models for suitable use cases
- Alert on cost anomalies that indicate runaway loops
9. Plan for Continuous Learning
Agents should improve over time:
- Collect feedback on agent responses (thumbs up/down)
- Analyze conversations that required escalation
- Identify common failure patterns
- Fine-tune models on production interaction data (with appropriate consent)
- Iterate on prompts based on real usage
- Share learnings across the team
10. Document Everything
Comprehensive documentation is critical as teams scale:
- Agent architecture and design decisions
- Tool configurations and API contracts
- Deployment procedures and runbooks
- Incident response procedures
- Version migration guides
- Evaluation methodologies
Conclusion
You now have a complete, production-ready AI agent deployed to Azure AI Foundry with:
- LangGraph-based agent orchestration
- Tool-calling capabilities
- Multi-turn conversation support
- Containerized deployment
- CI/CD automation
- Evaluation framework
- Multiple client implementations
Key Takeaway
- LangGraph provides flexible agent orchestration with state management
- Azure AI Agent Server SDK simplifies deployment to Azure AI Foundry
- Custom state converter is critical for production deployments with tool calls
- CI/CD automation enables rapid iteration and deployment
- Evaluation framework ensures agent quality and performance
Resources
- Azure AI Foundry Documentation
- LangGraph Documentation
- Azure AI Agent Server SDK
- OpenAI Responses API
Thanks
Manoranjan Rajguru
https://www.linkedin.com/in/manoranjan-rajguru/