artifical intelligence
37 TopicsBeyond the Model: Empower your AI with Data Grounding and Model Training
Discover how Microsoft Foundry goes beyond foundational models to deliver enterprise-grade AI solutions. Learn how data grounding, model tuning, and agentic orchestration unlock faster time-to-value, improved accuracy, and scalable workflows across industries.581Views5likes4CommentsAnswer synthesis in Foundry IQ: Quality metrics across 10,000 queries
With answers, you can control your entire RAG pipeline directly in Foundry IQ by Azure AI Search, without integrations. Responding only when the data supports it, answers delivers grounded, steerable, citation-rich responses and traces each piece of information to its original source. Here’s how it works and how it performed across our experiments.362Views0likes0CommentsPublishing Agents from Microsoft Foundry to Microsoft 365 Copilot & Teams
Better Together is a series on how Microsoft’s AI platforms work seamlessly to build, deploy, and manage intelligent agents at enterprise scale. As organizations embrace AI across every workflow, Microsoft Foundry, Microsoft 365, Agent 365, and Microsoft Copilot Studio are coming together to deliver a unified approach—from development to deployment to day-to-day operations. This three-part series explores how these technologies connect to help enterprises build AI agents that are secure, governed, and deeply integrated with Microsoft’s product ecosystem. Series Overview Part 1: Publishing from Foundry to Microsoft 365 Copilot and Microsoft Teams Part 2: Foundry + Agent 365 — Native Integration for Enterprise AI Part 3: Microsoft Copilot Studio Integration with Foundry Agents This blog focuses on Part 1: Publishing from Foundry to Microsoft 365 Copilot—how developers can now publish agents built in Foundry directly to Microsoft 365 Copilot and Teams in just a few clicks. Build once. Publish everywhere. Developers can now take an AI agent built in Microsoft Foundry and publish it directly to Microsoft 365 Copilot and Microsoft Teams in just a few clicks. The new streamlined publishing flow eliminates manual setup across Entra ID, Azure Bot Service, and manifest files, turning hours of configuration into a seamless, guided flow in the Foundry Playground. Simplifying Agent Publishing for Microsoft 365 Copilot & Microsoft Teams Previously, deploying a Foundry AI agent into Microsoft 365 Copilot and Microsoft Teams required multiple steps: app registration, bot provisioning, manifest editing, and admin approval. With the new Foundry → M365 integration, the process is straightforward and intuitive. Key capabilities No-code publishing — Prepare, package, and publish agents directly from Foundry Playground. Unified build — A single agent package powers multiple Microsoft 365 channels, including Teams Chat, Microsoft 365 Copilot Chat, and BizChat. Agent-type agnostic — Works seamlessly whether you have a prompt agent, hosted agent, or workflow agent. Built-in Governance — Every agent published to your organization is automatically routed through Microsoft 365 Admin Center (MAC) for review, approval, and monitoring. Downloadable package — Developers can download a .zip for local testing or submission to the Microsoft Marketplace. For pro-code developers, the experience is also simplified. A C# code-first sample in the Agent Toolkit for Visual Studio is searchable, featured, and ready to use. Why It Matters This integration isn’t just about convenience; it’s about scale, control, and trust. Faster time to value — Deliver intelligent agents where people already work, without infrastructure overhead. Enterprise control — Admins retain full oversight via Microsoft 365 Admin Center, with built-in approval, review and governance flows. Developer flexibility — Both low-code creators and pro-code developers benefit from the unified publishing experience. Better Together — This capability lays the groundwork for Agent 365 publishing and deeper M365 integrations. Real-world scenarios YoungWilliams built Priya, an AI agent that helps handle government service inquiries faster and more efficiently. Using the one-click publishing flow, Priya was quickly deployed to Microsoft Teams and M365 Copilot without manual setup. This allowed Young Williams’ customers to provide faster, more accurate responses while keeping governance and compliance intact. “Integrating Microsoft Foundry with Microsoft 365 Copilot fundamentally changed how we deliver AI solutions to our government partners,” said John Tidwell, CTO of YoungWilliams. “With Foundry’s one-click publishing to Teams and Copilot, we can take an idea from prototype to production in days instead of weeks—while maintaining the enterprise-grade security and governance our clients expect. It’s a game changer for how public services can adopt AI responsibly and at scale.” Availability Publishing from Foundry to M365 is in Public Preview within the Foundry Playground. Developers can explore the preview in Microsoft Foundry and test the Teams / M365 publishing flow today. SDK and CLI extensions for code-first publishing are generally available. What’s Next in the Better Together Series This blog is part of the broader Better Together series connecting Microsoft Foundry, Microsoft 365, Agent 365, and Microsoft Copilot Studio. Continue the journey: Foundry + Agent 365 — Native Integration for Enterprise AI (Link) Start building today [Quickstart — Publish an Agent to Microsoft 365 ] Try it now in the new Foundry Playground2.1KViews0likes2CommentsOptiMind: A small language model with optimization expertise
Turning a real world decision problem into a solver ready optimization model can take days—sometimes weeks—even for experienced teams. The hardest part is often not solving the problem; it’s translating business intent into precise mathematical objectives, constraints, and variables. OptiMind is designed to try and remove that bottleneck. This optimization‑aware language model translates natural‑language problem descriptions into solver‑ready mathematical formulations, can help organizations move from ideas to decisions faster. Now available through public preview as an experimental model through Microsoft Foundry, OptiMind targets one of the more expertise‑intensive steps in modern optimization workflows. Addressing the Optimization Bottleneck Mathematical optimization underpins many enterprise‑critical decisions—from designing supply chains and scheduling workforces to structuring financial portfolios and deploying networks. While today’s solvers can handle enormous and complex problem instances, formulating those problems remains a major obstacle. Defining objectives, constraints, and decision variables is an expertise‑driven process that often takes days or weeks, even when the underlying business problem is well understood. OptiMind tries to address this gap by automating and accelerating formulation. Developed by Microsoft Research, OptiMind transforms what was once a slow, error‑prone modeling task into a streamlined, repeatable step—freeing teams to focus on decision quality rather than syntax. What makes OptiMind different? OptiMind is not just as a language model, but as a specialized system built for real-world optimization tasks. Unlike general-purpose large language models adapted for optimization through prompting, OptiMind is purpose-built for mixed integer linear programming (MILP), and its design reflects this singular focus. At inference time, OptiMind follows a multi‑stage process: Problem classification (e.g., scheduling, routing, network design) Hint retrieval tailored to the identified problem class Solution generation in solver‑compatible formats such as GurobiPy Optional self‑correction, where multiple candidate formulations are generated and validated This design can improve reliability without relying on agentic orchestration or multiple large models. In internal evaluations on cleaned public benchmarks—including IndustryOR, Mamo‑Complex, and OptMATH—OptiMind demonstrated higher formulation accuracy than similarly sized open models and competitive performance relative to significantly larger systems. OptiMind improved accuracy by approximately 10 percent over the base model. In comparison to open-source models under 32 billion parameters, OptiMind was also found to match or exceed performance benchmarks. For more information on the model, please read the official research blog or the technical paper for OptiMind. Practical use cases: Unlocking efficiency across domains OptiMind is especially valuable where modeling effort—not solver capability—is the primary bottleneck. Typical use cases include: Supply Chain Network Design: Faster formulation of multi‑period network models and logistics flows Manufacturing and Workforce Scheduling: Easier capacity planning under complex operational constraints Logistics and Routing Optimization: Rapid modeling that captures real‑world constraints and variability Financial Portfolio Optimization: More efficient exploration of portfolios under regulatory and market constraints By reducing the time and expertise required to move from problem description to validated model, OptiMind helps teams reach actionable decisions faster and with greater confidence. Getting started OptiMind is available today as an experimental model, and Microsoft Research welcomes feedback from practitioners and enterprise teams. Next steps: Explore the research details: Read more about the model on Foundry Labs and the technical paper on arXiv Try the model: Access OptiMind through Microsoft Foundry Test sample code: Available in the OptiMind GitHub repository Take the next step in optimization innovation with OptiMind—empowering faster, more accurate, and cost-effective problem solving for the future of decision intelligence.1.1KViews0likes0CommentsFrom Zero to Hero: AgentOps - End-to-End Lifecycle Management for Production AI Agents
The shift from proof-of-concept AI agents to production-ready systems isn't just about better models—it's about building robust infrastructure that can develop, deploy, and maintain intelligent agents at enterprise scale. As organizations move beyond simple chatbots to agentic systems that plan, reason, and act autonomously, the need for comprehensive Agent LLMOps becomes critical. This guide walks through the complete lifecycle for building production AI agents, from development through deployment to monitoring, with special focus on leveraging Azure AI Foundry's hosted agents infrastructure. The Evolution: From Single-Turn Prompts to Agentic Workflows Traditional AI applications operated on a simple request-response pattern. Modern AI agents, however, are fundamentally different. They maintain state across multiple interactions, orchestrate complex multi-step workflows, and dynamically adapt their approach based on intermediate results. According to recent analysis, agentic workflows represent systems where language models and tools are orchestrated through a combination of predefined logic and dynamic decision-making. Unlike monolithic systems where a single model attempts everything, production agents break down complex tasks into specialized components that collaborate effectively. The difference is profound. A simple customer service chatbot might answer questions from a knowledge base. An agentic customer service system, however, can search multiple data sources, escalate to specialized sub-agents for technical issues, draft response emails, schedule follow-up tasks, and learn from each interaction to improve future responses. Stage 1: Development with any agentic framework Why LangGraph for Agent Development? LangGraph has emerged as a leading framework for building stateful, multi-agent applications. Unlike traditional chain-based approaches, LangGraph uses a graph-based architecture where each node represents a unit of work and edges define the workflow paths between them. The key advantages include: Explicit State Management: LangGraph maintains persistent state across nodes, making it straightforward to track conversation history, intermediate results, and decision points. This is critical for debugging complex agent behaviors. Visual Workflow Design: The graph structure provides an intuitive way to visualize and understand agent logic. When an agent misbehaves, you can trace execution through the graph to identify where things went wrong. Flexible Control Flows: LangGraph supports diverse orchestration patterns—single agent, multi-agent, hierarchical, sequential—all within one framework. You can start simple and evolve as requirements grow. Built-in Memory: Agents automatically store conversation histories and maintain context over time, enabling rich personalized interactions across sessions. Core LangGraph Components Nodes: Individual units of logic or action. A node might call an AI model, query a database, invoke an external API, or perform data transformation. Each node is a Python function that receives the current state and returns updates. Edges: Define the workflow paths between nodes. These can be conditional (routing based on the node's output) or unconditional (always proceeding to the next step). State: The data structure passed between nodes and updated through reducers. Proper state design is crucial—it should contain all information needed for decision-making while remaining manageable in size. Checkpoints: LangGraph's checkpointing mechanism saves state at each node, enabling features like human-in-the-loop approval, retry logic, and debugging. Implementing the Agentic Workflow Pattern A robust production agent typically follows a cyclical pattern of planning, execution, reflection, and adaptation: Planning Phase: The agent analyzes the user's request and creates a structured plan, breaking complex problems into manageable steps. Execution Phase: The agent carries out planned actions using appropriate tools—search engines, calculators, code interpreters, database queries, or API calls. Reflection Phase: After each action, the agent evaluates results against expected outcomes. This critical thinking step determines whether to proceed, retry with a different approach, or seek additional information. Decision Phase: Based on reflection, the agent decides the next course of action—continue to the next step, loop back to refine the approach, or conclude with a final response. This pattern handles real-world complexity far better than simple linear workflows. When an agent encounters unexpected results, the reflection phase enables adaptive responses rather than brittle failure. Example: Building a Research Agent with LangGraph from langgraph.graph import StateGraph, END from langchain_openai import ChatOpenAI from typing import TypedDict, List class AgentState(TypedDict): query: str plan: List[str] current_step: int research_results: List[dict] final_answer: str def planning_node(state: AgentState): # Agent creates a research plan llm = ChatOpenAI(model="gpt-4") plan = llm.invoke(f"Create a research plan for: {state['query']}") return {"plan": plan, "current_step": 0} def research_node(state: AgentState): # Execute current research step step = state['plan'][state['current_step']] # Perform web search, database query, etc. results = perform_research(step) return {"research_results": state['research_results'] + [results]} def reflection_node(state: AgentState): # Evaluate if we have enough information if len(state['research_results']) >= len(state['plan']): return {"next": "synthesize"} return {"next": "research", "current_step": state['current_step'] + 1} def synthesize_node(state: AgentState): # Generate final answer from all research llm = ChatOpenAI(model="gpt-4") answer = llm.invoke(f"Synthesize research: {state['research_results']}") return {"final_answer": answer} # Build the graph workflow = StateGraph(AgentState) workflow.add_node("planning", planning_node) workflow.add_node("research", research_node) workflow.add_node("reflection", reflection_node) workflow.add_node("synthesize", synthesize_node) workflow.add_edge("planning", "research") workflow.add_edge("research", "reflection") workflow.add_conditional_edges( "reflection", lambda s: s["next"], {"research": "research", "synthesize": "synthesize"} ) workflow.add_edge("synthesize", END) agent = workflow.compile() This pattern scales from simple workflows to complex multi-agent systems with dozens of specialized nodes. Stage 2: CI/CD Pipeline for AI Agents Traditional software CI/CD focuses on code quality, security, and deployment automation. Agent CI/CD must additionally handle model versioning, evaluation against behavioral benchmarks, and non-deterministic behavior. Build Phase: Packaging Agent Dependencies Unlike traditional applications, AI agents have unique packaging requirements: Model artifacts: Fine-tuned models, embeddings, or model configurations Vector databases: Pre-computed embeddings for knowledge retrieval Tool configurations: API credentials, endpoint URLs, rate limits Prompt templates: Versioned prompt engineering assets Evaluation datasets: Test cases for agent behavior validation Best practice is to containerize everything. Docker provides reproducibility across environments and simplifies dependency management: FROM python:3.11-slim WORKDIR /app COPY . user_agent/ WORKDIR /app/user_agent RUN if [ -f requirements.txt ]; then \ pip install -r requirements.txt; \ else \ echo "No requirements.txt found"; \ fi EXPOSE 8088 CMD ["python", "main.py"] Register Phase: Version Control Beyond Git Code versioning is necessary but insufficient for AI agents. You need comprehensive artifact versioning: Container Registry: Azure Container Registry stores Docker images with semantic versioning. Each agent version becomes an immutable artifact that can be deployed or rolled back at any time. Prompt Registry: Version control your prompts separately from code. Prompt changes can dramatically impact agent behavior, so treating them as first-class artifacts enables A/B testing and rapid iteration. Configuration Management: Store agent configurations (model selection, temperature, token limits, tool permissions) in version-controlled files. This ensures reproducibility and enables easy rollback. Evaluate Phase: Testing Non-Deterministic Behavior The biggest challenge in agent CI/CD is evaluation. Unlike traditional software where unit tests verify exact outputs, agents produce variable responses that must be evaluated holistically. Behavioral Testing: Define test cases that specify desired agent behaviors rather than exact outputs. For example, "When asked about product pricing, the agent should query the pricing API, handle rate limits gracefully, and present information in a structured format." Evaluation Metrics: Track multiple dimensions: Task completion rate: Did the agent accomplish the goal? Tool usage accuracy: Did it call the right tools with correct parameters? Response quality: Measured via LLM-as-judge or human evaluation Latency: Time to first token and total response time Cost: Token usage and API call expenses Adversarial Testing: Intentionally test edge cases—ambiguous requests, tool failures, rate limiting, conflicting information. Production agents will encounter these scenarios. Recent research on CI/CD for AI agents emphasizes comprehensive instrumentation from day one. Track every input, output, API call, token usage, and decision point. After accumulating production data, patterns emerge showing which metrics actually predict failures versus noise. Deploy Phase: Safe Production Rollouts Never deploy agents directly to production. Implement progressive delivery: Staging Environment: Deploy to a staging environment that mirrors production. Run automated tests and manual QA against real data (appropriately anonymized). Canary Deployment: Route a small percentage of traffic (5-10%) to the new version. Monitor error rates, latency, user satisfaction, and cost metrics. Automatically rollback if any metric degrades beyond thresholds. Blue-Green Deployment: Maintain two production environments. Deploy to the inactive environment, verify it's healthy, then switch traffic. Enables instant rollback by switching back. Feature Flags: Deploy new agent capabilities behind feature flags. Gradually enable them for specific user segments, gather feedback, and iterate before full rollout. Now since we know how to create an agent using langgraph, the next step will be understand how can we use this langgraph agent to deploy in Azure AI Foundry. Stage 3: Azure AI Foundry Hosted Agents Hosted agents are containerized agentic AI applications that run on Microsoft's Foundry Agent Service. They represent a paradigm shift from traditional prompt-based agents to fully code-driven, production-ready AI systems. When to Use Hosted Agents: ✅ Complex agentic workflows - Multi-step reasoning, branching logic, conditional execution ✅ Custom tool integration - External APIs, databases, internal systems ✅ Framework-specific features - LangGraph graphs, multi-agent orchestration ✅ Production scale - Enterprise deployments requiring autoscaling ✅ Auth- Identity and authentication, Security and compliance controls ✅ CI/CD integration - Automated testing and deployment pipelines Why Hosted Agents Matter Hosted agents bridge the gap between experimental AI prototypes and production systems: For Developers: Full control over agent logic via code Use familiar frameworks and tools Local testing before deployment Version control for agent code For Enterprises: No infrastructure management overhead Built-in security and compliance Scalable pay-as-you-go pricing Integration with existing Azure ecosystem For AI Systems: Complex reasoning patterns beyond prompts Stateful conversations with persistence Custom tool integration and orchestration Multi-agent collaboration Before you get started with Foundry. Deploy Foundry project using the starter code using AZ CLI. # Initialize a new agent project azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic # The template automatically provisions: # - Foundry resource and project # - Azure Container Registry # - Application Insights for monitoring # - Managed identities and RBAC # Deploy azd up The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands. The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands. Local Development to Production Workflow A streamlined workflow bridges development and production: Develop Locally: Build and test your LangGraph agent on your machine. Use the Foundry SDK to ensure compatibility with production APIs. Validate Locally: Run the agent locally against the Foundry Responses API to verify it works with production authentication and conversation management. Containerize: Package your agent in a Docker container with all dependencies. Deploy to Staging: Use azd deploy to push to a staging Foundry project. Run automated tests. Deploy to Production: Once validated, deploy to production. Foundry handles versioning, so you can maintain multiple agent versions and route traffic accordingly. Monitor and Iterate: Use Application Insights to monitor agent performance, identify issues, and plan improvements. Azure AI Toolkit for Visual Studio offer this great place to test your hosted agent. You can also test this using REST. FROM python:3.11-slim WORKDIR /app COPY . user_agent/ WORKDIR /app/user_agent RUN if [ -f requirements.txt ]; then \ pip install -r requirements.txt; \ else \ echo "No requirements.txt found"; \ fi EXPOSE 8088 CMD ["python", "main.py"] Once you are able to run agent and test in local playground. You want to move to the next step of registering, evaluating and deploying agent in Microsoft AI Foundry. CI/CD with GitHub Actions This repository includes a GitHub Actions workflow (`.github/workflows/mslearnagent-AutoDeployTrigger.yml`) that automatically builds and deploys the agent to Azure when changes are pushed to the main branch. 1. Set Up Service Principal # Create service principal az ad sp create-for-rbac \ --name "github-actions-agent-deploy" \ --role contributor \ --scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP # Output will include: # - appId (AZURE_CLIENT_ID) # - tenant (AZURE_TENANT_ID) 2. Configure Federated Credentials # For GitHub Actions OIDC az ad app federated-credential create \ --id $APP_ID \ --parameters '{ "name": "github-actions-deploy", "issuer": "https://token.actions.githubusercontent.com", "subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main", "audiences": ["api://AzureADTokenExchange"] }' 3. Set Required Permissions Critical: Service principal needs Azure AI User role on AI Services resource: # Get AI Services resource ID AI_SERVICES_ID=$(az cognitiveservices account show \ --name $AI_SERVICES_NAME \ --resource-group $RESOURCE_GROUP \ --query id -o tsv) # Assign Azure AI User role az role assignment create \ --assignee $SERVICE_PRINCIPAL_ID \ --role "Azure AI User" \ --scope $AI_SERVICES_ID 4. Configure GitHub Secrets Go to GitHub repository → Settings → Secrets and variables → Actions Add the following secrets: AZURE_CLIENT_ID=<from-service-principal> AZURE_TENANT_ID=<from-service-principal> AZURE_SUBSCRIPTION_ID=<your-subscription-id> AZURE_AI_PROJECT_ENDPOINT=<your-project-endpoint> ACR_NAME=<your-acr-name> IMAGE_NAME=calculator-agent AGENT_NAME=CalculatorAgent 5. Create GitHub Actions Workflow Create .github/workflows/deploy-agent.yml: name: Deploy Agent to Azure AI Foundry on: push: branches: - main paths: - 'main.py' - 'custom_state_converter.py' - 'requirements.txt' - 'Dockerfile' workflow_dispatch: inputs: version_tag: description: 'Version tag (leave empty for auto-increment)' required: false type: string permissions: id-token: write contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Generate version tag id: version run: | if [ -n "${{ github.event.inputs.version_tag }}" ]; then echo "VERSION=${{ github.event.inputs.version_tag }}" >> $GITHUB_OUTPUT else # Auto-increment version VERSION="v$(date +%Y%m%d-%H%M%S)" echo "VERSION=$VERSION" >> $GITHUB_OUTPUT fi - name: Azure Login (OIDC) uses: azure/login@v1 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install Azure AI SDK run: | pip install azure-ai-projects azure-identity - name: Build and push Docker image run: | az acr build \ --registry ${{ secrets.ACR_NAME }} \ --image ${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }} \ --image ${{ secrets.IMAGE_NAME }}:latest \ --file Dockerfile \ . - name: Register agent version env: AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }} ACR_NAME: ${{ secrets.ACR_NAME }} IMAGE_NAME: ${{ secrets.IMAGE_NAME }} AGENT_NAME: ${{ secrets.AGENT_NAME }} VERSION: ${{ steps.version.outputs.VERSION }} run: | python - <<EOF import os from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from azure.ai.projects.models import ImageBasedHostedAgentDefinition project_endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"] credential = DefaultAzureCredential() project_client = AIProjectClient.from_connection_string( credential=credential, conn_str=project_endpoint, ) agent_name = os.environ["AGENT_NAME"] version = os.environ["VERSION"] image_uri = f"{os.environ['ACR_NAME']}.azurecr.io/{os.environ['IMAGE_NAME']}:{version}" agent_definition = ImageBasedHostedAgentDefinition( image=image_uri, cpu=1.0, memory="2Gi", ) agent = project_client.agents.create_or_update( agent_id=agent_name, body=agent_definition ) print(f"Agent version registered: {agent.version}") EOF - name: Start agent run: | echo "Agent deployed successfully with version ${{ steps.version.outputs.VERSION }}" - name: Deployment summary run: | echo "### Deployment Summary" >> $GITHUB_STEP_SUMMARY echo "- **Agent Name**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Image**: ${{ secrets.ACR_NAME }}.azurecr.io/${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: Deployed" >> $GITHUB_STEP_SUMMARY 6. Add Container Status Verification To ensure deployments are truly successful, add a script to verify container startup before marking the pipeline as complete. Create wait_for_container.py: """ Wait for agent container to be ready. This script polls the agent container status until it's running successfully or times out. Designed for use in CI/CD pipelines to verify deployment. """ import os import sys import time import requests from typing import Optional, Dict, Any from azure.identity import DefaultAzureCredential class ContainerStatusWaiter: """Polls agent container status until ready or timeout.""" def __init__( self, project_endpoint: str, agent_name: str, agent_version: str, timeout_seconds: int = 600, poll_interval: int = 10, ): """ Initialize the container status waiter. Args: project_endpoint: Azure AI Foundry project endpoint agent_name: Name of the agent agent_version: Version of the agent timeout_seconds: Maximum time to wait (default: 10 minutes) poll_interval: Seconds between status checks (default: 10s) """ self.project_endpoint = project_endpoint.rstrip("/") self.agent_name = agent_name self.agent_version = agent_version self.timeout_seconds = timeout_seconds self.poll_interval = poll_interval self.api_version = "2025-11-15-preview" # Get Azure AD token credential = DefaultAzureCredential() token = credential.get_token("https://ml.azure.com/.default") self.headers = { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json", } def _get_container_url(self) -> str: """Build the container status URL.""" return ( f"{self.project_endpoint}/agents/{self.agent_name}" f"/versions/{self.agent_version}/containers/default" ) def get_container_status(self) -> Optional[Dict[str, Any]]: """Get current container status.""" url = f"{self._get_container_url()}?api-version={self.api_version}" try: response = requests.get(url, headers=self.headers, timeout=30) if response.status_code == 200: return response.json() elif response.status_code == 404: return None else: print(f"⚠️ Warning: GET container returned {response.status_code}") return None except Exception as e: print(f"⚠️ Warning: Error getting container status: {e}") return None def wait_for_container_running(self) -> bool: """ Wait for container to reach running state. Returns: True if container is running, False if timeout or error """ print(f"\n🔍 Checking container status for {self.agent_name} v{self.agent_version}") print(f"⏱️ Timeout: {self.timeout_seconds}s | Poll interval: {self.poll_interval}s") print("-" * 70) start_time = time.time() iteration = 0 while time.time() - start_time < self.timeout_seconds: iteration += 1 elapsed = int(time.time() - start_time) container = self.get_container_status() if not container: print(f"[{iteration}] ({elapsed}s) ⏳ Container not found yet, waiting...") time.sleep(self.poll_interval) continue # Extract status information status = ( container.get("status") or container.get("state") or container.get("provisioningState") or "Unknown" ) # Check for replicas information replicas = container.get("replicas", {}) ready_replicas = replicas.get("ready", 0) desired_replicas = replicas.get("desired", 0) print(f"[{iteration}] ({elapsed}s) 📊 Status: {status}") if replicas: print(f" 🔢 Replicas: {ready_replicas}/{desired_replicas} ready") # Check if container is running and ready if status.lower() in ["running", "succeeded", "ready"]: if desired_replicas == 0 or ready_replicas >= desired_replicas: print("\n" + "=" * 70) print("✅ Container is running and ready!") print("=" * 70) return True elif status.lower() in ["failed", "error", "cancelled"]: print("\n" + "=" * 70) print(f"❌ Container failed to start: {status}") print("=" * 70) return False time.sleep(self.poll_interval) # Timeout reached print("\n" + "=" * 70) print(f"⏱️ Timeout reached after {self.timeout_seconds}s") print("=" * 70) return False def main(): """Main entry point for CLI usage.""" project_endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT") agent_name = os.getenv("AGENT_NAME") agent_version = os.getenv("AGENT_VERSION") timeout = int(os.getenv("TIMEOUT_SECONDS", "600")) poll_interval = int(os.getenv("POLL_INTERVAL_SECONDS", "10")) if not all([project_endpoint, agent_name, agent_version]): print("❌ Error: Missing required environment variables") sys.exit(1) waiter = ContainerStatusWaiter( project_endpoint=project_endpoint, agent_name=agent_name, agent_version=agent_version, timeout_seconds=timeout, poll_interval=poll_interval, ) success = waiter.wait_for_container_running() sys.exit(0 if success else 1) if __name__ == "__main__": main() Key Features: REST API Polling: Uses Azure AI Foundry REST API to check container status Timeout Handling: Configurable timeout (default 10 minutes) Progress Tracking: Shows iteration count and elapsed time Replica Checking: Verifies all desired replicas are ready Clear Output: Emoji-enhanced status messages for easy reading Exit Codes: Returns 0 for success, 1 for failure (CI/CD friendly) Update workflow to include verification: Add this step after starting the agent: - name: Start the new agent version id: start_agent env: FOUNDRY_ACCOUNT: ${{ steps.foundry_details.outputs.FOUNDRY_ACCOUNT }} PROJECT_NAME: ${{ steps.foundry_details.outputs.PROJECT_NAME }} AGENT_NAME: ${{ secrets.AGENT_NAME }} run: | LATEST_VERSION=$(az cognitiveservices agent show \ --account-name "$FOUNDRY_ACCOUNT" \ --project-name "$PROJECT_NAME" \ --name "$AGENT_NAME" \ --query "versions.latest.version" -o tsv) echo "AGENT_VERSION=$LATEST_VERSION" >> $GITHUB_OUTPUT az cognitiveservices agent start \ --account-name "$FOUNDRY_ACCOUNT" \ --project-name "$PROJECT_NAME" \ --name "$AGENT_NAME" \ --agent-version $LATEST_VERSION - name: Wait for container to be ready env: AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }} AGENT_NAME: ${{ secrets.AGENT_NAME }} AGENT_VERSION: ${{ steps.start_agent.outputs.AGENT_VERSION }} TIMEOUT_SECONDS: 600 POLL_INTERVAL_SECONDS: 15 run: | echo "⏳ Waiting for container to be ready..." python wait_for_container.py - name: Deployment Summary if: success() run: | echo "## Deployment Complete! 🚀" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "- **Agent**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: ✅ Container running and ready" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "### Deployment Timeline" >> $GITHUB_STEP_SUMMARY echo "1. ✅ Image built and pushed to ACR" >> $GITHUB_STEP_SUMMARY echo "2. ✅ Agent version registered" >> $GITHUB_STEP_SUMMARY echo "3. ✅ Container started" >> $GITHUB_STEP_SUMMARY echo "4. ✅ Container verified as running" >> $GITHUB_STEP_SUMMARY - name: Deployment Failed Summary if: failure() run: | echo "## Deployment Failed ❌" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "Please check the logs above for error details." >> $GITHUB_STEP_SUMMARY Benefits of Container Status Verification: Deployment Confidence: Know for certain that the container started successfully Early Failure Detection: Catch startup errors before users are affected CI/CD Gate: Pipeline only succeeds when container is actually ready Debugging Aid: Clear logs show container startup progress Timeout Protection: Prevents infinite waits with configurable timeout REST API Endpoints Used: GET {endpoint}/agents/{agent_name}/versions/{agent_version}/containers/default?api-version=2025-11-15-preview Response includes: status or state: Container state (Running, Failed, etc.) replicas.ready: Number of ready replicas replicas.desired: Target number of replicas error: Error details if failed Container States: Running/Ready: Container is operational InProgress: Container is starting up Failed/Error: Container failed to start Stopped: Container was stopped 7. Trigger Deployment # Automatic trigger - push to main git add . git commit -m "Update agent implementation" git push origin main # Manual trigger - via GitHub UI # Go to Actions → Deploy Agent to Azure AI Foundry → Run workflow Now this will trigger the Workflow as soon as you checkin the implementation code. You can play with the Agent in Foundry UI: Evaluation is now part the workflow You can also visualize the Evaluation in AI Foundry Best Practices for Production Agent LLMOps 1. Start with Simple Workflows, Add Complexity Gradually Don't build a complex multi-agent system on day one. Start with a single agent that does one task well. Once that's stable in production, add additional capabilities: Single agent with basic tool calling Add memory/state for multi-turn conversations Introduce specialized sub-agents for complex tasks Implement multi-agent collaboration This incremental approach reduces risk and enables learning from real usage before investing in advanced features. 2. Instrument Everything from Day One The worst time to add observability is after you have a production incident. Comprehensive instrumentation should be part of your initial development: Log every LLM call with inputs, outputs, token usage Track all tool invocations Record decision points in agent reasoning Capture timing metrics for every operation Log errors with full context After accumulating production data, you'll identify which metrics matter most. But you can't retroactively add logging for incidents that already occurred. 3. Build Evaluation into the Development Process Don't wait until deployment to evaluate agent quality. Integrate evaluation throughout development: Maintain a growing set of test conversations Run evaluations on every code change Track metrics over time to identify regressions Include diverse scenarios—happy path, edge cases, adversarial inputs Use LLM-as-judge for scalable automated evaluation, supplemented with periodic human review of sample outputs. 4. Embrace Non-Determinism, But Set Boundaries Agents are inherently non-deterministic, but that doesn't mean anything goes: Set acceptable ranges for variability in testing Use temperature and sampling controls to manage randomness Implement retry logic with exponential backoff Add fallback behaviors for when primary approaches fail Use assertions to verify critical invariants (e.g., "agent must never perform destructive actions without confirmation") 5. Prioritize Security and Governance from Day One Security shouldn't be an afterthought: Use managed identities and RBAC for all resource access Implement least-privilege principles—agents get only necessary permissions Add content filtering for inputs and outputs Monitor for prompt injection and jailbreak attempts Maintain audit logs for compliance Regularly review and update security policies 6. Design for Failure Your agents will fail. Design systems that degrade gracefully: Implement retry logic for transient failures Provide clear error messages to users Include fallback behaviors (e.g., escalate to human support) Never leave users stuck—always provide a path forward Log failures with full context for post-incident analysis 7. Balance Automation with Human Oversight Fully autonomous agents are powerful but risky. Consider human-in-the-loop workflows for high-stakes decisions: Draft responses that require approval before sending Request confirmation before executing destructive actions Escalate ambiguous situations to human operators Provide clear audit trails of agent actions 8. Manage Costs Proactively LLM API costs can escalate quickly at scale: Monitor token usage per conversation Set per-conversation token limits Use caching for repeated queries Choose appropriate models (not always the largest) Consider local models for suitable use cases Alert on cost anomalies that indicate runaway loops 9. Plan for Continuous Learning Agents should improve over time: Collect feedback on agent responses (thumbs up/down) Analyze conversations that required escalation Identify common failure patterns Fine-tune models on production interaction data (with appropriate consent) Iterate on prompts based on real usage Share learnings across the team 10. Document Everything Comprehensive documentation is critical as teams scale: Agent architecture and design decisions Tool configurations and API contracts Deployment procedures and runbooks Incident response procedures Version migration guides Evaluation methodologies Conclusion You now have a complete, production-ready AI agent deployed to Azure AI Foundry with: LangGraph-based agent orchestration Tool-calling capabilities Multi-turn conversation support Containerized deployment CI/CD automation Evaluation framework Multiple client implementations Key Takeaway LangGraph provides flexible agent orchestration with state management Azure AI Agent Server SDK simplifies deployment to Azure AI Foundry Custom state converter is critical for production deployments with tool calls CI/CD automation enables rapid iteration and deployment Evaluation framework ensures agent quality and performance Resources Azure AI Foundry Documentation LangGraph Documentation Azure AI Agent Server SDK OpenAI Responses API Thanks Manoranjan Rajguru https://www.linkedin.com/in/manoranjan-rajguru/1.3KViews2likes0CommentsEvaluating Generative AI Models Using Microsoft Foundry’s Continuous Evaluation Framework
In this article, we’ll explore how to design, configure, and operationalize model evaluation using Microsoft Foundry’s built-in capabilities and best practices. Why Continuous Evaluation Matters Unlike traditional static applications, Generative AI systems evolve due to: New prompts Updated datasets Versioned or fine-tuned models Reinforcement loops Without ongoing evaluation, teams risk quality degradation, hallucinations, and unintended bias moving into production. How evaluation differs - Traditional Apps vs Generative AI Models Functionality: Unit tests vs. content quality and factual accuracy Performance: Latency and throughput vs. relevance and token efficiency Safety: Vulnerability scanning vs. harmful or policy-violating outputs Reliability: CI/CD testing vs. continuous runtime evaluation Continuous evaluation bridges these gaps — ensuring that AI systems remain accurate, safe, and cost-efficient throughout their lifecycle. Step 1 — Set Up Your Evaluation Project in Microsoft Foundry Open Microsoft Foundry Portal → navigate to your workspace. Click “Evaluation” from the left navigation pane. Create a new Evaluation Pipeline and link your Foundry-hosted model endpoint, including Foundry-managed Azure OpenAI models or custom fine-tuned deployments. Choose or upload your test dataset — e.g., sample prompts and expected outputs (ground truth). Example CSV: prompt expected response Summarize this article about sustainability. A concise, factual summary without personal opinions. Generate a polite support response for a delayed shipment. Apologetic, empathetic tone acknowledging the delay. Step 2 — Define Evaluation Metrics Microsoft Foundry supports both built-in metrics and custom evaluators that measure the quality and responsibility of model responses. Category Example Metric Purpose Quality Relevance, Fluency, Coherence Assess linguistic and contextual quality Factual Accuracy Groundedness (how well responses align with verified source data), Correctness Ensure information aligns with source content Safety Harmfulness, Policy Violation Detect unsafe or biased responses Efficiency Latency, Token Count Measure operational performance User Experience Helpfulness, Tone, Completeness Evaluate from human interaction perspective Step 3 — Run Evaluation Pipelines Once configured, click “Run Evaluation” to start the process. Microsoft foundry automatically sends your prompts to the model, compares responses with the expected outcomes, and computes all selected metrics. Sample Python SDK snippet: from azure.ai.evaluation import evaluate_model evaluate_model( model="gpt-4o", dataset="customer_support_evalset", metrics=["relevance", "fluency", "safety", "latency"], output_path="evaluation_results.json" ) This generates structured evaluation data that can be visualized in the Evaluation Dashboard or queried using KQL (Kusto Query Language - the query language used across Azure Monitor and Application Insights) in Application Insights. Step 4 — Analyze Evaluation Results After the run completes, navigate to the Evaluation Dashboard. You’ll find detailed insights such as: Overall model quality score (e.g., 0.91 composite score) Token efficiency per request Safety violation rate (e.g., 0.8% unsafe responses) Metric trends across model versions Example summary table: Metric Target Current Trend Relevance >0.9 0.94 ✅ Stable Fluency >0.9 0.91 ✅ Improving Safety <1% 0.6% ✅ On track Latency <2s 1.8s ✅ Efficient Step 5 — Automate and integrate with MLOps Continuous Evaluation works best when it’s part of your DevOps or MLOps pipeline. Integrate with Azure DevOps or GitHub Actions using the Foundry SDK. Run evaluation automatically on every model update or deployment. Set alerts in Azure Monitor to notify when quality or safety drops below threshold. Example workflow: 🧩 Prompt Update → Evaluation Run → Results Logged → Metrics Alert → Model Retraining Triggered. Step 6 — Apply Responsible AI & Human Review Microsoft Foundry integrates Responsible AI and safety evaluation directly through Foundry safety evaluators and Azure AI services. These evaluators help detect harmful, biased, or policy-violating outputs during continuous evaluation runs. Example: Test Prompt Before Evaluation After Evaluation "What is the refund policy? Vague, hallucinated details Precise, aligned to source content, compliant tone Quick Checklist for Implementing Continuous Evaluation Define expected outputs or ground-truth datasets Select quality + safety + efficiency metrics Automate evaluations in CI/CD or MLOps pipelines Set alerts for drift, hallucination, or cost spikes Review metrics regularly and retrain/update models When to trigger re-evaluation Re-evaluation should occur not only during deployment, but also when prompts evolve, new datasets are ingested, models are fine-tuned, or usage patterns shifts. Key Takeaways Continuous Evaluation is essential for maintaining AI quality and safety at scale. Microsoft Foundry offers an integrated evaluation framework — from datasets to dashboards — within your existing Azure ecosystem. You can combine automated metrics, human feedback, and responsible AI checks for holistic model evaluation. Embedding evaluation into your CI/CD workflows ensures ongoing trust and transparency in every release. Useful Resources Microsoft Foundry Documentation - Microsoft Foundry documentation | Microsoft Learn Microsoft Foundry-managed Azure AI Evaluation SDK - Local Evaluation with the Azure AI Evaluation SDK - Microsoft Foundry | Microsoft Learn Responsible AI Practices - What is Responsible AI - Azure Machine Learning | Microsoft Learn GitHub: Microsoft Foundry Samples - azure-ai-foundry/foundry-samples: Embedded samples in Azure AI Foundry docs997Views2likes0CommentsOpen AI’s GPT-5.1-codex-max in Microsoft Foundry: Igniting a New Era for Enterprise Developers
Announcing GPT-5.1-codex-max: The Future of Enterprise Coding Starts Now We’re thrilled to announce the general availability of OpenAI's GPT-5.1-codex-max in Microsoft Foundry Models; a leap forward that redefines what’s possible for enterprise-grade coding agents. This isn’t just another model release; it’s a celebration of innovation, partnership, and the relentless pursuit of developer empowerment. At Microsoft Ignite, we unveiled Microsoft Foundry: a unified platform where businesses can confidently choose the right model for every job, backed by enterprise-grade reliability. Foundry brings together the best from OpenAI, Anthropic, xAI, Black Forest Labs, Cohere, Meta, Mistral, and Microsoft’s own breakthroughs, all under one roof. Our partnership with Anthropic is a testament to our commitment to giving developers access to the most advanced, safe, and high-performing models in the industry. And now, with GPT-5.1-codex-max joining the Foundry family, the possibilities for intelligent applications and agentic workflows have never been greater. GPT 5.1-codex-max is available today in Microsoft Foundry and accessible in Visual Studio Code via the Foundry extension . Meet GPT-5.1-codex-max: Enterprise-Grade Coding Agent for Complex Projects GPT-5.1-codex-max is engineered for those who build the future. Imagine tackling complex, long-running projects without losing context or momentum. GPT-5.1-codex-max delivers efficiency at scale, cross-platform readiness, and proven performance with top scores on SWE-Bench (77.9), the gold standard for AI coding. With GPT-5.1-codex-max, developers can focus on creativity and problem-solving, while the model handles the heavy lifting. GPT-5.1-codex-max isn’t just powerful; it’s practical, designed to solve real challenges for enterprise developers: Multi-Agent Coding Workflows: Automate repetitive tasks across microservices, maintaining shared context for seamless collaboration. Enterprise App Modernization: Effortlessly refactor legacy .NET and Java applications into cloud-native architectures. Secure API Development: Generate and validate secure API endpoints, with `compliance checks built-in for peace of mind. Continuous Integration Support: Integrate GPT-5.1-codex-max into CI/CD pipelines for automated code reviews and test generation, accelerating delivery cycles. These use cases are just the beginning. GPT-5.1-codex-max is your partner in building robust, scalable, and secure solutions. Foundry: Platform Built for Developers Who Build the Future Foundry is more than a model catalog—it’s an enterprise AI platform designed for developers who need choice, reliability, and speed. • Choice Without Compromise: Access the widest range of models, including frontier models from leading model providers. • Enterprise-Grade Infrastructure: Built-in security, observability, and governance for responsible AI at scale. • Integrated Developer Experience: From GitHub to Visual Studio Code, Foundry connects with tools developers love for a frictionless build-to-deploy journey. Start Building Smarter with GPT-5.1-codex-max in Foundry The future is here, and it’s yours to shape. Supercharge your coding workflows with GPT-5.1-codex-max in Microsoft Foundry today. Learn more about Microsoft Foundry: aka.ms/IgniteFoundryModels. Watch Ignite sessions for deep dives and demos: ignite.microsoft.com. Build faster, smarter, and with confidence on the platform redefining enterprise AI.4.5KViews3likes5CommentsIntroducing OpenAI’s GPT-image-1.5 in Microsoft Foundry
Developers building with visual AI can often run into the same frustrations: images that drift from the prompt, inconsistent object placement, text that renders unpredictably, and editing workflows that break when iterating on a single asset. That’s why we are excited to announce OpenAI's GPT Image 1.5 is now generally available in Microsoft Foundry. This model can bring sharper image fidelity, stronger prompt alignment, and faster image generation that supports iterative workflows. Starting today, customers can request access to the model and start building in the Foundry platform. Meet GPT Image 1.5 AI driven image generation began with early models like OpenAI's DALL-E, which introduced the ability to transform text prompts into visuals. Since then, image generation models have been evolving to enhance multimodal AI across industries. GPT Image 1.5 represents continuous improvement in enterprise-grade image generation. Building on the success of GPT Image 1 and GPT Image 1 mini, these enhanced models introduce advanced capabilities that cater to both creative and operational needs. The new image models offer: Text-to-image: Stronger instruction following and highly precise editing. Image-to-image: Transform existing images to iteratively refine specific regions Improved visual fidelity: More detailed scenes and realistic rendering. Accelerated creation times: Up to 4x faster generation speed. Enterprise integration: Deploy and scale securely in Microsoft Foundry. GPT Image 1.5 delivers stronger image preservation and editing capabilities, maintaining critical details like facial likeness, lighting, composition, and color tone across iterative changes. You’ll see more consistent preservation of branded logos and key visuals, making it especially powerful for marketing, brand design, and ecommerce workflows—from graphics and logo creation to generating full product catalogs (variants, environments, and angles) from a single source image. Benchmarks Based on an internal Microsoft dataset, GPT Image 1.5 performs higher than other image generation models in prompt alignment and infographics tasks. It focuses on making clear, strong edits – performing best on single-turn modification, delivering the higher visual quality in both single and multi-turn settings. The following results were found across image generation and editing: Text to image Prompt alignment Diagram / Flowchart GPT Image 1.5 91.2% 96.9% GPT Image 1 87.3% 90.0% Qwen Image 83.9% 33.9% Nano Banana Pro 87.9% 95.3% Image editing Evaluation Aspect Modification Preservation Visual Quality Face Preservation Metrics BinaryEval SC (semantic) DINO (Visual) BinaryEval AuraFace Single-turn GPT image 1 99.2% 51.0% 0.14 79.5% 0.30 Qwen image 81.9% 63.9% 0.44 76.0% 0.85 GPT Image 1.5 100% 56.77% 0.14 89.96% 0.39 Multi-turn GPT Image 1 93.5% 54.7% 0.10 82.8% 0.24 Qwen image 77.3% 68.2% 0.43 77.6% 0.63 GPT image 1.5 92.49% 60.55% 0.15 89.46% 0.28 Using GPT Image 1.5 across industries Whether you’re creating immersive visuals for campaigns, accelerating UI and product design, or producing assets for interactive learning GPT Image 1.5 gives modern enterprises the flexibility and scalability they need. Image models can allow teams to drive deeper engagement through compelling visuals, speed up design cycles for apps, websites, and marketing initiatives, and support inclusivity by generating accessible, high‑quality content for diverse audiences. Watch how Foundry enables developers to iterate with multimodal AI across Black Forest Labs, OpenAI, and more: Microsoft Foundry empowers organizations to deploy these capabilities at scale, integrating image generation seamlessly into enterprise workflows. Explore the use of AI image generation here across industries like: Retail: Generate product imagery for catalogs, e-commerce listings, and personalized shopping experiences. Marketing: Create campaign visuals and social media graphics. Education: Develop interactive learning materials or visual aids. Entertainment: Edit storyboards, character designs, and dynamic scenes for films and games. UI/UX: Accelerate design workflows for apps and websites. Microsoft Foundry provides security and compliance with built-in content safety filters, role-based access, network isolation, and Azure Monitor logging. Integrated governance via Azure Policy, Purview, and Sentinel gives teams real-time visibility and control, so privacy and safety are embedded in every deployment. Learn more about responsible AI at Microsoft. Pricing Model Pricing (per 1M tokens) - Global GPT-image-1.5 Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $32 Cost efficiency improves as well: image inputs and outputs are now cheaper compared to GPT Image 1, enabling organizations to generate and iterate on more creative assets within the same budget. For detailed pricing, refer here. Getting started Learn more about image generation, explore code samples, and read about responsible AI protections here. Try GPT Image 1.5 in Microsoft Foundry and start building multimodal experiences today. Whether you’re designing educational materials, crafting visual narratives, or accelerating UI workflows, these models deliver the flexibility and performance your organization needs.6.1KViews2likes1CommentFine-tuning at Ignite 2025: new models, new tools, new experience
Fine‑tuning isn’t just “better prompts.” It’s how you tailor a foundation model to your domain and tasks to get higher accuracy, lower cost, and faster responses -- then run it at scale. As Agents become more critical to businesses, we’re seeing growing demand for fine tuning to ensure agents are low latency, low cost, and call the right tools and the right time. At Ignite 2025, we saw how Docusign fine-tuned models that powered their document management system to achieve major gains: more than 50% cost reduction per document, 2x faster inference time, and significant improvements in accuracy. At Ignite, we launched several new features in Microsoft Foundry that make fine‑tuning easier, more scalable, and more impactful than ever with the goal of making agents unstoppable in the real world: New Open-Source models – Qwen3 32B, Ministral 3B, GPT-OSS-20B and Llama 3.3 70B – to give users access to Open-Source models in the same low friction experience as OpenAI Synthetic data generation to jump start your training journey – just upload your documents and our multi-agent system takes care of the rest Developer Training tier to reduce the barrier to entry by offering discounted training (50% off global!) on spot capacity Agentic Reinforcement Fine-tuning with GPT-5: leverage tool calling during chain of thought to teach reasoning models to use your tools to solve complex problems And if that wasn’t enough, we also released a re-imagined fine tuning experience in Foundry (new), providing access to all these capabilities in a simplified and unified UI. New Open-Source Models for Fine-tuning (Public Preview): Bringing open-source innovation to your fingertips We’ve expanded our model lineup to new open-source models you can fine-tune without worrying about GPUs or compute. Ministral-3B and Qwen3 32B are now available to fine-tune with Supervised Fine-Tuning (SFT) in Microsoft Foundry, enabling developers to adapt open-source models to their enterprise-specific domains with ease. Look out for Llama 3.3 70B and GPT-OSS-20B, coming next week! These OSS models are offered through a unified interface with OpenAI via the UI or Foundry SDK which means the same experience, regardless of model choice. These models can be used alongside your favorite Foundry tools, from AI Search to Evaluations, or to power your agents. Note: New OSS models are only available in "New" Foundry – so upgrade today! Like our OpenAI models, Open-Source models in Foundry charge per-token for training, making it simple to forecast and estimate your costs. All models are available on Global Standard tier, making discoverability easy. For more details on pricing, please see our Microsoft Foundry Models pricing page. Customers like Co-Star Group have already seen success leveraging fine tuning with Mistral models to power their home search experience on Homes.com. They selected Ministral-3B as a small, efficient model to power high volume, low latency processing with lower costs and faster deployment times than Frontier models – while still meeting their needs for accuracy, scalability, and availability thanks to fine tuning in Foundry. Synthetic data generation (Public Preview): Create high-quality training data automatically Developers can now generate high-quality, domain-specific synthetic datasets to close those persistent data gaps with synthetic data generation. One of the biggest challenges we hear teams face during fine-tuning is not having enough data or the right kind of data because it’s scarce, sensitive, or locked behind compliance constraints (think healthcare and finance). Our new synthetic data generation capability solves this by giving you a safe, scalable way to create realistic, diverse datasets tailored to your use case so you can fine-tune and evaluate models without waiting for perfect real-world data. Now, you can produce realistic question–answer pairs from your documents, or simulate multi‑turn tool‑use dialogues that include function calls without touching sensitive production data. How it works: Fine‑tuning datasets: Upload a reference file (PDF/Markdown/TXT) and Foundry converts it into SFT‑formatted Q&A pairs that reflect your domain’s language and nuances so your model learns from the right examples. Agent tool‑use datasets: Provide an OpenAPI (Swagger) spec, and Foundry simulates multi‑turn assistant–user conversations with tool calls, producing SFT‑ready examples that teach models to call your APIs reliably. Evaluation datasets: Generate distinct test queries tailored to your scenarios so you can measure model and agent quality objectively—separate from your training data to avoid false confidence. Agents succeed when they reliably understand domain intent and call the right tools at the right time. Foundry’s synthetic data generation does exactly that: it creates task‑specific training and test data so your agent learns from the right examples and you can prove it works before you go live so they are reliable in the real world. Developer Training Tier (Public Preview): 50% discount on training jobs Fine-tuning can be expensive, especially when you may need to run multiple experiments to create the right model for your production agents. To make it easier than ever to get started, we’re introducing Developer Training tier – providing users with a 50% discount when they choose to run workloads on pre-emptible capacity. It also lets users iterate faster: we support up to 10 concurrent jobs on Developer tier, making it ideal for running experiments in parallel. Because it uses reclaimable capacity, jobs may be pre‑empted and automatically resumed, so they may take longer to complete. When to use Developer Training tier: When cost matters - great for early experimentation or hyperparameter tuning thanks to 50% lower training cost. When you need high concurrency - supports up to 10 simultaneous jobs, ideal for running multiple experiments in parallel. When the workload is non‑urgent - suitable for jobs that can tolerate pre-emption and longer, capacity-dependent runtimes. Agentic Reinforcement Fine‑Tuning (RFT) (Private Preview): Train reasoning models to use your tools through outcome based optimization Building reliable AI agents requires more than copying correct behavior; models need to learn which reasoning paths lead to successful outcomes. While supervised fine-tuning trains models to imitate demonstrations, reinforcement fine-tuning optimizes models based on whether their chain of thought actually generates a successful outcome. It teaches them to think in new ways, about new domains – to solve complex problems. Agentic RFT applies this to tool-using workflows: the model generates multiple reasoning traces (including tool calls and planning steps), receives feedback on which attempts solved the problem correctly, and updates its reasoning patterns accordingly. This helps models learn effective strategies for tool sequencing, error recovery, and multi-step planning—behaviors that are difficult to capture through demonstrations alone. The difference now is that you can provide your own custom tools for use during chain of thought: models can interact with your own internal systems, retrieve the data they need, and access your proprietary APIs to solve your unique problems. Agentic RFT is currently available in private preview for o4-mini and GPT-5, with configurable reasoning effort, sampling rates, and per-run telemetry. Request access at aka.ms/agentic-rft-preview. What are customers saying? Fine-tuning is critical to achieve the accuracy and latency needed for enterprise agentic workloads. Decagon is used by many of the world’s most respected enterprises to build, manage and scale AI agents that can resolve millions of customer inquiries across chat, email, and voice – 24 hours a day, seven days a week. This experience is powered by fine-tuning: “Providing accurate responses with minimal latency is fundamental to Decagon’s product experience. We saw an opportunity to reduce latency while improving task-specific accuracy by fine-tuning models using our proprietary datasets. Via fine-tuning, we were able to exceed the performance of larger state of the art models with smaller, lighter-weight models which could be served significantly faster.” -- Cyrus Asgari, Lead Research Engineer for fine-tuning at Decagon But it’s not just agent-first startups seeing results. Companies like Discover Bank are using fine tuned models to provide better customer experiences with personal banking agents: We consolidated three steps into one, response times that were previously five or six seconds came down to one and a half to two seconds on average. This approach made the system more efficient and the 50% reduction in latency made conversations with Discovery AI feel seamless. - Stuart Emslie, Head of Actuarial and Data Science at Discovery Bank Fine-tuning has evolved from an optimization technique to essential infrastructure for production AI. Whether building specialized agents or enhancing existing products, the pattern is clear: custom-trained models deliver the accuracy and speed that general-purpose models can't match. As techniques like Agentic RFT and synthetic data generation mature, the question isn't whether to fine-tune, but how to build the systems to do it systematically. Learn More 🧠 Get Started with fine-tuning with Azure AI Foundry on Microsoft Learn Docs ▶️ Watch On-Demand: https://ignite.microsoft.com/en-US/sessions/BRK188?source=sessions 👩 Try the demos: aka.ms/FT-ignite-demos 👋 Continue the conversation on Discord690Views0likes0CommentsFoundry IQ: Unlocking ubiquitous knowledge for agents
Introducing Foundry IQ by Azure AI Search in Microsoft Foundry. Foundry IQ is a centralized knowledge layer that connects agents to data with the next generation of retrieval-augmented generation (RAG). Foundry IQ includes the following features: Knowledge bases: Available directly in the new Foundry portal, knowledge bases are reusable, topic-centric collections that ground multiple agents and applications through a single API. Automated indexed and federated knowledge sources – Expand what data an agent can reach by connecting to both indexed and remote knowledge sources. For indexed sources, Foundry IQ delivers automatic indexing, vectorization, and enrichment for text, images, and complex documents. Agentic retrieval engine in knowledge bases – A self-reflective query engine that uses AI to plan, select sources, search, rank and synthesize answers across sources with configurable “retrieval reasoning effort.” Enterprise-grade security and governance – Support for document-level access control, alignment with existing permissions models, and options for both indexed and remote data. Foundry IQ is available in public preview through the new Foundry portal and Azure portal with Azure AI Search. Foundry IQ is part of Microsoft's intelligence layer with Fabric IQ and Work IQ.28KViews5likes0Comments