azure ai studio
172 TopicsBuilding a multimodal, multi-agent system using Azure AI Agent Service and OpenAI Agent SDK
In the rapidly evolving landscape of artificial intelligence (AI), the development of systems that can autonomously interact, learn, and make decisions has become a focal point. A pivotal aspect of this advancement is the architecture of these systems, specifically the distinction between single-agent and multi-agent frameworks. Single-Agent Systems A single-agent system consists of one autonomous entity operating within an environment to achieve specific goals. This agent perceives its surroundings, processes information, and acts accordingly, all in isolation. For example, a standalone chatbot designed to handle customer inquiries functions as a single-agent system, managing interactions without collaborating with other agents. Multi-Agent Systems In contrast, a multi-agent system (MAS) comprises multiple autonomous agents that interact within a shared environment. These agents can collaborate, negotiate, or even compete to achieve individual or collective objectives. For instance, in a smart manufacturing setup, various robots (agents) might work together on an assembly line, each performing distinct tasks but coordinating to optimize the overall production process. Distinctions Between Single-Agent and Multi-Agent Architectures Interaction Dynamics: Single-agent systems operate independently without the need for communication protocols. In contrast, MAS require sophisticated mechanisms for agents to interact effectively, ensuring coordination and conflict resolution. Complexity and Scalability: While single-agent systems are generally simpler to design and implement, they may struggle with complex or large-scale problems. MAS offer scalability by distributing tasks among agents, enhancing the system’s ability to handle intricate challenges. Robustness and Fault Tolerance: The decentralized nature of MAS contributes to greater resilience. If one agent fails, others can adapt or take over its functions, maintaining overall system performance. Single-agent systems lack this redundancy, making them more vulnerable to failures. Context of This Guide This guide focuses on setting up a Telco Customer Service use case using OpenAI’s Agent SDK within a multi-agent architecture. By leveraging Microsoft’s Azure AI Agent Service and integrating Azure AI Search, we aim to create a system where specialized agents collaborate to provide efficient and accurate responses to user inquiries. This approach not only showcases the practical application of MAS but also highlights the benefits of combining advanced AI tools to enhance the user experience. Prerequisites Before setting up your multi-agent system, ensure you have the following: Azure Subscription: An active Azure account is essential to access Azure AI services. If you don’t have one, you can create a free account. Azure AI Foundry Access: Access to Azure AI Foundry is necessary for creating AI hubs and projects. Azure AI Search Resource: Set up an Azure AI Search resource to enable the agent to retrieve relevant information efficiently. Development Environment: Set up a suitable environment for development, which includes: Azure CLI: Install the Azure Command-Line Interface to manage Azure resources from your terminal. Ensure it’s updated to the latest version. Azure AI Foundry SDK: For creating and managing AI agents. OpenAI Agent SDK: Install the OpenAI Agent SDK to facilitate the development of agentic applications. Code Editor: Such as Visual Studio Code, for writing and editing your deployment scripts. Setting Up Azure AI Agent Service Follow this blog to set up an AI Hub in Azure AI Foundry, deploy a GPT-4o model, and create your AI agent with specific instructions and tools. Add the Azure AI Search tool by following this guide. Ensure you have a sample knowledge reference PDF document uploaded to the blob storage for indexing. Setting Up Multimodal, Multi-Agent System This code implements a conversational AI application using Azure OpenAI and Chainlit. It defines multiple specialized agents to handle user interactions, each with distinct responsibilities: Setup your local development environment: Follow the steps below from cloning the repository to running the chainlit application. You can find the “Project connection string” inside your project “Overview” section in AI Foundry. Still in AI Foundry, “Agent ID” can be found inside your “Agents” section. Azure OpenAI credentials can be found under "Models + endpoints" # Your AI Foundry Project connection string, found in the Foundry Project Overview page AIPROJECT_CONNECTION_STRING="<your-foundry-project-region>.api.azureml.ms;<your-subscription-id>;<your-resource-group>;<your-foundry-project>" FAQ_AGENT_ID=<agent-id> # Azure OpenAI Configuration AZURE_OPENAI_API_KEY=your_azure_openai_api_key AZURE_OPENAI_API_VERSION=2024-02-15-preview AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o 1. git clone -b Multi-AI-Agent-OpenAI-Agent-SDK https://github.com/robrita/tech-blogs 2. copy sample.env to .env and update 3. python -m venv venv 4. .\venv\Scripts\activate 5. python -m pip install -r requirements.txt 6. chainlit run app.py Main Components: Agents: Triage Agent: Routes user requests to the appropriate specialized agent based on the user’s query. FAQ Agent: Answers frequently asked questions by using an external FAQ lookup tool which is basically our Azure AI Agent Service. Account Management Agent: Handles user account updates, such as changing usernames, uploading ID images, and updating birth dates. Live Agent: Simulates a human customer service representative named Sarah, handling complex issues or explicit requests for human assistance. faq_agent = Agent[TelcoAgentContext]( name="FAQ Agent", handoff_description="A helpful agent that can answer questions about Telco Digital.", instructions=f"""{RECOMMENDED_PROMPT_PREFIX} You are an FAQ agent. If you are speaking to a customer, you probably were transferred to from the triage agent. Use the following routine to support the customer. # Routine 1. Identify the last question asked by the customer. 2. Use the faq lookup tool to answer the question. Do not rely on your own knowledge. 3. If you cannot answer the question, transfer back to the triage agent.""", tools=[faq_lookup_tool], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) account_management_agent = Agent[TelcoAgentContext]( name="Account Management Agent", handoff_description="A helpful agent that can update customer user name.", instructions=f"""{RECOMMENDED_PROMPT_PREFIX} You are an account management agent. If you are speaking to a customer, you probably were transferred to from the triage agent. Use the following routine to support the customer. # Routine 1. Ask for their new user name, an image of their government ID or passport to be uploaded, and their birth date. # Very Important Notes - If the customer asks a question that is not related to the routine, transfer back to the triage agent. - Never respond to the user with any PII data such as password, ID number, etc.""", tools=[update_user_name], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) live_agent = Agent[TelcoAgentContext]( name="Live Agent", handoff_description="A live human agent that can handle complex issues or when a user specifically requests human assistance.", instructions=f"""{RECOMMENDED_PROMPT_PREFIX} You are a human customer service representative named Sarah. Act as if you are a real person, not an AI. Use a conversational, friendly tone and occasionally add small human touches to your responses. You should handle: 1. Complex issues that other agents couldn't resolve 2. Situations where a user has asked the same question multiple times 3. When a user explicitly asks to speak with a human agent 4. Technical errors or issues within the application # Human touches you can incorporate: - Mention taking notes: "Let me note that down for you" - Reference checking systems: "Let me check our system for that information" - Personalize responses: "I understand how frustrating that can be" - Occasionally mention your "team" or "colleagues" If the customer's issue is resolved or is actually simple enough for the automated system to handle, you can transfer them back to the triage agent.""", tools=[], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) triage_agent = Agent[TelcoAgentContext]( name="Triage Agent", handoff_description="A triage agent that can delegate a customer's request to the appropriate agent.", instructions=( f"{RECOMMENDED_PROMPT_PREFIX} " "You are a helpful triaging agent. You can use your tools to delegate questions to other appropriate agents." "Use the response from other agents to answer the question. Do not rely on your own knowledge." "Other than greetings, do not answer any questions yourself." "If a user explicitly asks for a human agent or live support, transfer them to the Live Agent." "If a user is asking the same question more than two times, transfer them to the Live Agent." "# Very Important Notes" "- Never respond to the user with any PII data such as password, ID number, etc." ), handoffs=[ handoff(agent=account_management_agent, on_handoff=on_account_management_handoff), faq_agent, live_agent, ], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) Tools: faq_lookup_tool: Queries an external FAQ system to answer user questions. @function_tool( name_override="faq_lookup_tool", description_override="Lookup frequently asked questions." ) async def faq_lookup_tool(question: str) -> str: print(f"User Question: {question}") start_time = cl.user_session.get("start_time") print(f"Elapsed time: {(time.time() - start_time):.2f} seconds - faq_lookup_tool") is_first_token = None try: # create thread for the agent thread_id = cl.user_session.get("new_threads").get(FAQ_AGENT_ID) print(f"thread ID: {thread_id}") # Create a message, with the prompt being the message content that is sent to the model project_client.agents.create_message( thread_id=thread_id, role="user", content=question, ) async with cl.Step(name="faq-agent") as step: step.input = question # Run the agent to process tne message in the thread with project_client.agents.create_stream(thread_id=thread_id, agent_id=FAQ_AGENT_ID) as stream: for event_type, event_data, _ in stream: if isinstance(event_data, MessageDeltaChunk): # Stream the message delta chunk await step.stream_token(event_data.text) if not is_first_token: print(f"Elapsed time: {(time.time() - start_time):.2f} seconds - {event_data.text}") is_first_token = True elif isinstance(event_data, ThreadRun): if event_data.status == "failed": print(f"Run failed. Error: {event_data.last_error}") raise Exception(event_data.last_error) elif event_type == AgentStreamEvent.ERROR: print(f"An error occurred. Data: {event_data}") raise Exception(event_data) # Get all messages from the thread messages = project_client.agents.list_messages(thread_id) # Get the last message from the agent last_msg = messages.get_last_text_message_by_role(MessageRole.AGENT) if not last_msg: raise Exception("No response from the model.") # Delete the thread later after processing delete_threads = cl.user_session.get("delete_threads") or [] delete_threads.append(thread_id) cl.user_session.set("delete_threads", delete_threads) # print(f"Last message: {last_msg.text.value}") return last_msg.text.value except Exception as e: logger.error(f"Error: {e}") return "I'm sorry, I encountered an error while processing your request. Please try again." update_user_name: Updates user account information based on provided details. @function_tool async def update_user_name( context: RunContextWrapper[TelcoAgentContext], user_name: str, image_path: str, birth_date: str, ) -> str: """ Update the customer user name using government ID or passport image and birth date. Args: user_name: The new customer user name. image_path: image file path of government ID or passport. birth_date: The customer birth date. """ # Update the context context.context.user_name = user_name context.context.image_path = image_path context.context.birth_date = birth_date print(f"Context context: {context.context}") # Ensure that the user ID has been set by the incoming handoff assert context.context.user_id is not None, "User ID is required" return f"Updated user name to {user_name}. ID image saved successfully." Session Management: Uses Chainlit’s user_session to store and manage session-specific data, such as the current agent, input history, context, and thread IDs. Thread Management: Creates and deletes conversation threads using Azure AI Project Client to manage isolated conversations for each agent interaction. Streaming Responses: Streams responses from Azure OpenAI models to the user interface in real-time, providing immediate feedback (“thinking…”) and incremental updates. Error Handling: Implements robust error handling to gracefully inform users of issues during processing. Chainlit Integration: Uses Chainlit decorators (@cl.on_chat_start, cl.on_message) to handle chat initialization and incoming messages. Full code for reference: from __future__ import annotations as _annotations import os import time import logging import asyncio import random import chainlit as cl from pydantic import BaseModel from dotenv import load_dotenv from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from openai.types.responses import ResponseTextDeltaEvent from openai import AsyncAzureOpenAI from azure.ai.projects.models import ( AgentStreamEvent, MessageDeltaChunk, MessageRole, ThreadRun, ) from agents import ( Agent, RunContextWrapper, Runner, TResponseInputItem, function_tool, handoff, OpenAIChatCompletionsModel, set_tracing_disabled, set_default_openai_client, set_default_openai_api ) from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX load_dotenv() # Disable verbose connection logs logger = logging.getLogger("azure.core.pipeline.policies.http_logging_policy") logger.setLevel(logging.WARNING) set_tracing_disabled(True) AIPROJECT_CONNECTION_STRING = os.getenv("AIPROJECT_CONNECTION_STRING") DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") FAQ_AGENT_ID = os.getenv("FAQ_AGENT_ID") azure_client = AsyncAzureOpenAI( api_version=os.getenv("AZURE_OPENAI_API_VERSION"), azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), api_key=os.getenv("MY_OPENAI_API_KEY"), ) set_default_openai_client(azure_client, use_for_tracing=False) set_default_openai_api("chat_completions") project_client = AIProjectClient.from_connection_string( conn_str=AIPROJECT_CONNECTION_STRING, credential=DefaultAzureCredential() ) class TelcoAgentContext(BaseModel): user_name: str | None = None image_path: str | None = None birth_date: str | None = None user_id: str | None = None ### TOOLS @function_tool( name_override="faq_lookup_tool", description_override="Lookup frequently asked questions." ) async def faq_lookup_tool(question: str) -> str: print(f"User Question: {question}") start_time = cl.user_session.get("start_time") print(f"Elapsed time: {(time.time() - start_time):.2f} seconds - faq_lookup_tool") is_first_token = None try: # create thread for the agent thread_id = cl.user_session.get("new_threads").get(FAQ_AGENT_ID) print(f"thread ID: {thread_id}") # Create a message, with the prompt being the message content that is sent to the model project_client.agents.create_message( thread_id=thread_id, role="user", content=question, ) async with cl.Step(name="faq-agent") as step: step.input = question # Run the agent to process tne message in the thread with project_client.agents.create_stream(thread_id=thread_id, agent_id=FAQ_AGENT_ID) as stream: for event_type, event_data, _ in stream: if isinstance(event_data, MessageDeltaChunk): # Stream the message delta chunk await step.stream_token(event_data.text) if not is_first_token: print(f"Elapsed time: {(time.time() - start_time):.2f} seconds - {event_data.text}") is_first_token = True elif isinstance(event_data, ThreadRun): if event_data.status == "failed": print(f"Run failed. Error: {event_data.last_error}") raise Exception(event_data.last_error) elif event_type == AgentStreamEvent.ERROR: print(f"An error occurred. Data: {event_data}") raise Exception(event_data) # Get all messages from the thread messages = project_client.agents.list_messages(thread_id) # Get the last message from the agent last_msg = messages.get_last_text_message_by_role(MessageRole.AGENT) if not last_msg: raise Exception("No response from the model.") # Delete the thread later after processing delete_threads = cl.user_session.get("delete_threads") or [] delete_threads.append(thread_id) cl.user_session.set("delete_threads", delete_threads) # print(f"Last message: {last_msg.text.value}") return last_msg.text.value except Exception as e: logger.error(f"Error: {e}") return "I'm sorry, I encountered an error while processing your request. Please try again." @function_tool async def update_user_name( context: RunContextWrapper[TelcoAgentContext], user_name: str, image_path: str, birth_date: str, ) -> str: """ Update the customer user name using government ID or passport image and birth date. Args: user_name: The new customer user name. image_path: image file path of government ID or passport. birth_date: The customer birth date. """ # Update the context context.context.user_name = user_name context.context.image_path = image_path context.context.birth_date = birth_date print(f"Context context: {context.context}") # Ensure that the user ID has been set by the incoming handoff assert context.context.user_id is not None, "User ID is required" return f"Updated user name to {user_name}. ID image saved successfully." ### HOOKS async def on_account_management_handoff(context: RunContextWrapper[TelcoAgentContext]) -> None: user_id = f"ID-{random.randint(100, 999)}" context.context.user_id = user_id ### AGENTS faq_agent = Agent[TelcoAgentContext]( name="FAQ Agent", handoff_description="A helpful agent that can answer questions about Telco Digital.", instructions=f"""{RECOMMENDED_PROMPT_PREFIX} You are an FAQ agent. If you are speaking to a customer, you probably were transferred to from the triage agent. Use the following routine to support the customer. # Routine 1. Identify the last question asked by the customer. 2. Use the faq lookup tool to answer the question. Do not rely on your own knowledge. 3. If you cannot answer the question, transfer back to the triage agent.""", tools=[faq_lookup_tool], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) account_management_agent = Agent[TelcoAgentContext]( name="Account Management Agent", handoff_description="A helpful agent that can update customer user name.", instructions=f"""{RECOMMENDED_PROMPT_PREFIX} You are an account management agent. If you are speaking to a customer, you probably were transferred to from the triage agent. Use the following routine to support the customer. # Routine 1. Ask for their new user name, an image of their government ID or passport to be uploaded, and their birth date. # Very Important Notes - If the customer asks a question that is not related to the routine, transfer back to the triage agent. - Never respond to the user with any PII data such as password, ID number, etc.""", tools=[update_user_name], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) live_agent = Agent[TelcoAgentContext]( name="Live Agent", handoff_description="A live human agent that can handle complex issues or when a user specifically requests human assistance.", instructions=f"""{RECOMMENDED_PROMPT_PREFIX} You are a human customer service representative named Sarah. Act as if you are a real person, not an AI. Use a conversational, friendly tone and occasionally add small human touches to your responses. You should handle: 1. Complex issues that other agents couldn't resolve 2. Situations where a user has asked the same question multiple times 3. When a user explicitly asks to speak with a human agent 4. Technical errors or issues within the application # Human touches you can incorporate: - Mention taking notes: "Let me note that down for you" - Reference checking systems: "Let me check our system for that information" - Personalize responses: "I understand how frustrating that can be" - Occasionally mention your "team" or "colleagues" If the customer's issue is resolved or is actually simple enough for the automated system to handle, you can transfer them back to the triage agent.""", tools=[], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) triage_agent = Agent[TelcoAgentContext]( name="Triage Agent", handoff_description="A triage agent that can delegate a customer's request to the appropriate agent.", instructions=( f"{RECOMMENDED_PROMPT_PREFIX} " "You are a helpful triaging agent. You can use your tools to delegate questions to other appropriate agents." "Use the response from other agents to answer the question. Do not rely on your own knowledge." "Other than greetings, do not answer any questions yourself." "If a user explicitly asks for a human agent or live support, transfer them to the Live Agent." "If a user is asking the same question more than two times, transfer them to the Live Agent." "# Very Important Notes" "- Never respond to the user with any PII data such as password, ID number, etc." ), handoffs=[ handoff(agent=account_management_agent, on_handoff=on_account_management_handoff), faq_agent, live_agent, ], model=OpenAIChatCompletionsModel( model=DEPLOYMENT_NAME, openai_client=azure_client, ), ) faq_agent.handoffs.append(triage_agent) account_management_agent.handoffs.append(triage_agent) live_agent.handoffs.append(triage_agent) async def main(user_input: str) -> None: current_agent = cl.user_session.get("current_agent") input_items = cl.user_session.get("input_items") context = cl.user_session.get("context") print(f"Received message: {user_input}") # Show thinking message to user msg = await cl.Message(f"thinking...", author="agent").send() msg_final = cl.Message("", author="agent") # Set an empty list for delete_threads in the user session cl.user_session.set("delete_threads", []) is_thinking = True try: input_items.append({"content": user_input, "role": "user"}) # Run the agent with streaming result = Runner.run_streamed(current_agent, input_items, context=context) last_agent = "" # Stream the response async for event in result.stream_events(): # Get the last agent name if event.type == "agent_updated_stream_event": if is_thinking: last_agent = event.new_agent.name msg.content = f"[{last_agent}] thinking..." await msg.send() # Get the message delta chunk elif event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent): if is_thinking: is_thinking = False await msg.remove() msg_final.content = f"[{last_agent}] " await msg_final.send() await msg_final.stream_token(event.data.delta) # Update the current agent and input items in the user session cl.user_session.set("current_agent", result.last_agent) cl.user_session.set("input_items", result.to_input_list()) except Exception as e: logger.error(f"Error: {e}") msg_final.content = "I'm sorry, I encountered an error while processing your request. Please try again." # show the last response in the UI await msg_final.update() # Delete threads after processing delete_threads = cl.user_session.get("delete_threads") or [] for thread_id in delete_threads: try: project_client.agents.delete_thread(thread_id) print(f"Deleted thread: {thread_id}") except Exception as e: print(f"Error deleting thread {thread_id}: {e}") # Create new thread for the next message new_threads = cl.user_session.get("new_threads") or {} for key in new_threads: if new_threads[key] in delete_threads: thread = project_client.agents.create_thread() new_threads[key] = thread.id print(f"Created new thread: {thread.id}") # Update new threads in the user session cl.user_session.set("new_threads", new_threads) # Chainlit setup @cl.on_chat_start async def on_chat_start(): # Initialize user session current_agent: Agent[TelcoAgentContext] = triage_agent input_items: list[TResponseInputItem] = [] cl.user_session.set("current_agent", current_agent) cl.user_session.set("input_items", input_items) cl.user_session.set("context", TelcoAgentContext()) # Create a thread for the agent thread = project_client.agents.create_thread() cl.user_session.set("new_threads", { FAQ_AGENT_ID: thread.id, }) @cl.on_message async def on_message(message: cl.Message): cl.user_session.set("start_time", time.time()) user_input = message.content for element in message.elements: # check if the element is an image if element.mime.startswith("image/"): user_input += f"\n[uploaded image] {element.path}" print(f"Received file: {element.path}") asyncio.run(main(user_input)) if __name__ == "__main__": # Chainlit will automatically run the application pass Workflow: When a user sends a message: The Triage Agent initially handles the request. Based on the user’s input, the Triage Agent delegates the request to the appropriate specialized agent (FAQ, Account Management, or Live Agent). The selected agent processes the request using its defined tools and instructions. Responses are streamed back to the user interface. After processing, temporary conversation threads are cleaned up, and new threads are created for subsequent interactions. Technologies Used: Azure OpenAI: For generating conversational responses. Azure AI Project Client: For managing agent threads and messages. OpenAI Agents SDK: For orchestrating multiple agents. Chainlit: For building interactive conversational UI. Pydantic: For structured data modeling. Asyncio: For asynchronous operations and streaming responses. In summary, multi-agent system provides a structured, modular conversational AI system designed to handle customer interactions efficiently, delegate tasks to specialized agents, manage user sessions, and integrate seamlessly with Azure’s AI services. Resources Implementation can be found at Multi-AI-Agent-OpenAI-Agent-SDK References: https://platform.openai.com/docs/guides/agents-sdk https://learn.microsoft.com/en-us/azure/ai-services/agents/quickstart https://learn.microsoft.com/en-us/azure/search/ ~Cheers! Robert Rita AI Cloud Solution Architect, ASEAN https://www.linkedin.com/in/robertrita/ #r0bai31Views0likes1CommentDeploy Your First Azure AI Agent Service on Azure App Service
1. Introduction Azure AI Agent Service is a fully managed service designed to empower developers to securely build, deploy, and scale high-quality, extensible AI agents without needing to manage the underlying compute and storage resources 1. These AI agents act as “smart” microservices that can answer questions, perform actions, or automate workflows by combining generative AI models with tools that allow them to interact with real-world data sources 1. Deploying Azure AI Agent Service on Azure App Service offers several benefits: Scalability: Azure App Service provides automatic scaling options to handle varying loads. Security: Built-in security features ensure that your AI agents are protected. Ease of Deployment: Simplified deployment processes allow developers to focus on building and improving their AI agents rather than managing infrastructure1. 2. Prerequisites Before you begin deploying Azure AI Agent Service on Azure App Service, ensure you have the following prerequisites in place: Azure Subscription: You need an active Azure subscription. If you don’t have one, you can create a free account on the Azure portal 2. Azure AI Foundry Access: Azure AI Foundry is the platform where you create and manage your AI agents. Ensure you have access to Azure AI Foundry and have the necessary permissions to create hubs and projects 2. Basic Knowledge of Azure App Service: Familiarity with Azure App Service is essential for configuring and deploying your AI agent. Understanding the basics of resource groups, app services, and hosting plans will be beneficial. Development Environment: Set up your development environment with the required tools and SDKs. This includes: Azure CLI: For managing Azure resources from the command line. Azure AI Foundry SDK: For creating and managing AI agents. Code Editor: Such as Visual Studio Code, for writing and editing your deployment scripts. 3. Setting Up Azure AI Agent Service To harness the capabilities of Azure AI Agent Service, follow these steps to set up the environment: a. Create an Azure AI Hub and Project Begin by establishing an AI Hub and initiating a new project within Azure AI Foundry: Access Azure Portal: Log in to the Azure Portal using your Azure credentials. Create AI Hub: Navigate to the search bar and search for “AI Foundry” Select “AI Foundry” and click “Create” and select “Hub”. Provide necessary details such as subscription, resource group, region, name and connect AI services. Review and create the AI Hub. Create a Project: Within the newly created AI Hub, click “Launch Azure AI Foundry” Under your new AI Hub, click “New project” and click “Create”. b. Deploy an Azure OpenAI Model With the project in place, deploy a suitable AI model: Model Deployment: On the left-hand side of the project panel, select “Models + Endpoints” and click “Deploy model” Select “Deploy base model” and choose “gpt-4o” and click “Confirm” Leave the default settings and click “Deploy” Detailed guidance is available in the Quickstart documentation. 4. Create and Configure the AI Agent After setting up the environment and deploying the model, proceed to create the AI agent: On the left-hand side of the project panel, select “Agents”. Click “New agent” and the default agent will be created which already connected to your Azure OpenAI model. 1. Define Instructions: Craft clear and concise instructions that guide the agent’s interactions. For example: instructions = "You are a helpful assistant capable of answering queries and performing tasks." 2. Integrate Tools: Incorporate tools to enhance the agent’s capabilities, such as: Code Interpreter: Allows the agent to execute code for data analysis. OpenAPI Tools: Enable the agent to interact with external APIs. Enable Code Interpreter tool: Still on the agent settings, in the “Actions” section, click “Add” and select “Code interpreter” and click “Save”. On the same agent settings panel at the top, click “Try in playground”. Do some quick test by entering “Hi” to the agent. 5. Develop a Chat Application Utilize the Azure AI Foundry SDK to instantiate and integrate up the agent. In this tutorial we will be using chainlit - an open-source Python package to quickly build Conversational AI application. 1. Setup your local development environment: Follow the steps below from cloning the repository to running the chainlit application. You can find the “Project connection string” inside your project “Overview” section in AI Foundry. Still in AI Foundry, “Agent ID” can be found inside your “Agents” section. git clone -b Deploy-AI-Agent-App-Service https://github.com/robrita/tech-blogs copy sample.env to .env and update python -m venv venv .\venv\Scripts\activate python -m pip install -r requirements.txt chainlit run app.py 2. Full code for reference: import os import chainlit as cl import logging from dotenv import load_dotenv from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from azure.ai.projects.models import ( MessageRole, ) # Load environment variables load_dotenv() # Disable verbose connection logs logger = logging.getLogger("azure.core.pipeline.policies.http_logging_policy") logger.setLevel(logging.WARNING) AIPROJECT_CONNECTION_STRING = os.getenv("AIPROJECT_CONNECTION_STRING") AGENT_ID = os.getenv("AGENT_ID") # Create an instance of the AIProjectClient using DefaultAzureCredential project_client = AIProjectClient.from_connection_string( conn_str=AIPROJECT_CONNECTION_STRING, credential=DefaultAzureCredential() ) # Chainlit setup @cl.on_chat_start async def on_chat_start(): # Create a thread for the agent if not cl.user_session.get("thread_id"): thread = project_client.agents.create_thread() cl.user_session.set("thread_id", thread.id) print(f"New Thread ID: {thread.id}") @cl.on_message async def on_message(message: cl.Message): thread_id = cl.user_session.get("thread_id") try: # Show thinking message to user msg = await cl.Message("thinking...", author="agent").send() project_client.agents.create_message( thread_id=thread_id, role="user", content=message.content, ) # Run the agent to process tne message in the thread run = project_client.agents.create_and_process_run(thread_id=thread_id, agent_id=AGENT_ID) print(f"Run finished with status: {run.status}") # Check if you got "Rate limit is exceeded.", then you want to increase the token limit if run.status == "failed": raise Exception(run.last_error) # Get all messages from the thread messages = project_client.agents.list_messages(thread_id) # Get the last message from the agent last_msg = messages.get_last_text_message_by_role(MessageRole.AGENT) if not last_msg: raise Exception("No response from the model.") msg.content = last_msg.text.value await msg.update() except Exception as e: await cl.Message(content=f"Error: {str(e)}").send() if __name__ == "__main__": # Chainlit will automatically run the application pass 3. Test Agent Functionality: Ensure the agent operates as intended. 6. Deploying on Azure App Service Deploying a Chainlit application on Azure App Service involves creating an App Service instance, configuring your application for deployment, and ensuring it runs correctly in the Azure environment. Here’s a step-by-step guide: 1. Create an Azure App Service Instance: Log in to the Azure Portal: Access the Azure Portal and sign in with your Azure account. Create a New Web App: Navigate to “App Services” and select “Create”. Fill in the necessary details: Subscription: Choose your Azure subscription. Resource Group: Select an existing resource group or create a new one. Name: Enter a unique name for your web app. Publish: Choose “Code”. Runtime Stack: Select “Python 3.12” or higher. Region: Choose the region closest to your users. Review and Create: After filling in the details, click “Review + Create” and then “Create” to provision the App Service. 2. Update Azure App Service Settings: Environment Variables: Add both “AIPROJECT_CONNECTION_STRING” and “AGENT_ID” Configuration: Set Startup Command to “startup.sh” Turn “On” the “SCM Basic Auth Publishing Credentials” setting. Turn “On” the “Session affinity” setting. Finally, click “Save”. Identity: Turn the status “On” under “System assigned” tab and click “Save”. 3. Assigned Role to your AI Foundry Project: In the Azure Portal, navigate to “AI Foundry” and select your Azure AI Project where the Agent was created. Select “Access Control(IAM)” and click “Add” to add role assignment. In the search bar, enter “AzureML Data Scientist” > “Next” > “Managed identity” > “Select members” > “App Service” > (Your app name) > “Review + Assign” 4. Deploy Your Application to Azure App Service: Deployment Methods: Azure App Service supports various deployment methods, including GitHub Actions, Azure DevOps, and direct ZIP uploads. Choose the method that best fits your workflow. Using External Public Github: In the Azure Portal, navigate to your App Service. Go to the “Deployment Center” and select the “External Git” deployment option. Enter “Repository”(https://github.com/robrita/tech-blogs) and “Branch”(Deploy-AI-Agent-App-Service). Keep “Public” and hit “Save”. Check Your Deployment: Still under “Deployment Center”, click “Logs” tab to view the deployment status. Once success, head over to the “Overview” section of your App Service to test the “Default domain”. Redeploy Your Application: To redeploy your app, under “Deployment Center”, click “Sync”. By following these steps, you can successfully deploy your Chainlit application on Azure App Service with first class Azure AI Agent Service integration, making it accessible to users globally. Resources Implementation can be found at Deploy-AI-Agent-App-Service References: https://learn.microsoft.com/en-us/azure/ai-services/agents/overview ~Cheers! Robert Rita AI Cloud Solution Architect, ASEAN https://www.linkedin.com/in/robertrita/ #r0bai128Views1like2CommentsBuild your own conversational AI agent and share $50K in prizes with Microsoft AI Skills Fest
What if your AI could do more than just respond? With Azure AI Agent Service, developers are building conversational AI agents that not only understand natural language but also take meaningful actions to drive business success. And with customized, self-paced skilling Plans and an upcoming, dedicated Hackathon event (with $50,000 in prizes!), Microsoft Learn is your one-stop resource for developing your own AI agent and further exploring the capabilities of Azure AI Agent Service. Build and deploy your own AI agent with Azure AI Agent Service Introduced at Microsoft Ignite 2024, Azure AI Agent Service is a fully managed platform designed to help you build, deploy, and scale high-quality conversational AI agents with minimal complexity. By leveraging Microsoft's advanced AI capabilities, this service provides a streamlined approach to developing intelligent applications that can understand natural language, retrieve relevant information, and take meaningful actions. Key capabilities of Azure AI Agent Service include: Pre-trained models: Access to pre-built models for common use cases, enabling rapid development without starting from scratch. Code-first customization: Flexibility to tailor AI solutions to specific business needs through a code-first approach. Seamless integration: Ability to connect agents with other Azure services, external APIs, and various data sources, including Microsoft 365 and SharePoint. Scalability: Ensures smooth performance under heavy user loads, allowing applications to grow with business needs without infrastructure concerns. Advanced NLP: Utilizes natural language processing techniques to facilitate conversational interactions between agents and users. Built-in analytics: Provides insights into agent performance and user behavior through an analytics dashboard. Earn a career-boosting AI skills badge and share in $50K in prizes Azure AI Agent Service continues a 50-year legacy of software and tools we at Microsoft have developed to help our users innovate, and our curated trove of AI learning resources goes a step further to help you develop crucial skills in a rapidly evolving tech space. That’s why we’re celebrating our 50th anniversary by launching a 24-hour, worldwide AI learning event aimed at empowering users to acquire new AI skills, drive AI transformation, and propel their careers. On April 8, 2025, Microsoft AI Skills Fest kicks off an exciting 5-week challenge that will not only push learners to achieve a verified AI skill badge that will differentiate them in today’s talent pool, but also share part of $50,000 in prizes! Running through the end of April, the AI Agents Hackathon will guide you through the process of building, training, and deploying autonomous agents across multiple programming languages using Azure AI. By taking part in the Hackathon, you’ll be able to: Understand the basics of creating an AI agent Utilize GitHub Copilot’s features to build and enhance AI agents Create generative AI solutions with Azure AI Foundry Accelerate AI app development with GitHub Learn about GitHub Copilot Extensions to your AI agent The Microsoft AI Skills Fest is an excellent opportunity to deepen your understanding and application of AI agent technologies, and we’re hoping to break a world record by skilling 100 million global learners. Join us on April 8th to boost your career and make history! Start skilling now with our expert-curated Plans on Microsoft Learn In preparation for Microsoft AI Skills Fest, we’ve made available a wide range of self-paced, expert-curated learning resources to get you started. Checkout our Create agentic AI solutions by using Azure AI Foundry Plan on Learn for the latest advancements in agentic AI technologies. The Plan is designed to empower you to build and integrate AI-driven agents seamlessly into your applications and learn advanced model fine-tuning techniques. Beyond AI agents, we have in-depth Plans on Microsoft Learn to help you explore the full scope of AI-based skilling we offer: Develop, replatform, and improve AI apps with advanced Azure AI services: Master AI app development on Azure by learning to replatform existing applications, assess architectural changes, and build new solutions. You'll explore comprehensive skills in code modernization, security implementation, deployment strategies, and testing across Azure services. Build, test, deploy applications securely with GitHub and Microsoft Azure: Elevate your DevOps and development workflow with an in-depth guide to secure application lifecycle management. You'll leverage GitHub Actions, Azure DevOps, and Azure Pipelines to automate processes, implement advanced security measures with GitHub Advanced Security, and adopt industry-leading development best practices. Find the best model for your generative AI solution with Azure AI Foundry: Unlock the potential of generative AI solutions using Azure AI Foundry, gaining expertise in model selection, multimodal capabilities, and advanced benchmarking. You'll learn to create sophisticated AI applications by effectively utilizing Azure AI Foundry and Azure OpenAI Service. Don’t miss out on this exciting AI skills challenge! Looking to take your AI skills to the next level and create your own conversational agent? Then join us on April 8, 2025, to join the Microsoft AI Skills Fest and the AI Agents Hackathon to earn your career-boosting skills badge and potentially share in $50,000 in prizes! Prepare for the event by participating in our series of Plans on Microsoft Learn.106Views1like1CommentModel Context Protocol (MCP): Integrating Azure OpenAI for Enhanced Tool Integration and Prompting
Model Context Protocol serves as a critical communication bridge between AI models and external systems, enabling AI assistants to interact directly with various services through a standardized interface. This protocol was designed to address the inherent limitations of standalone AI models by providing them with pathways to access real-time data, perform actions in external systems, and leverage specialized tools beyond their built-in capabilities. The fundamental architecture of MCP consists of client-server communication where the AI model (client) can send requests to specialized servers that handle specific service integrations, process these requests, and return formatted results that the AI can incorporate into its responses. This design pattern enables AI systems to maintain their core reasoning capabilities while extending their functional reach into practical applications that require interaction with external systems and databases MCP has the potential to function as a universal interface, think of it as the virtual / software version of USB-C for AI. Enabling seamless, secure and scalable data exchange between LLMs/AI Agents and external resources. MCP uses a client-server architecture where MCP hosts (AI applications) communicate with MCP servers (data/tool providers). Developers can use MCP to build reusable, modular connectors, with pre-built servers available for popular platforms, creating a community-driven ecosystem. MCP’s open-source nature encourages innovation, allowing developers to extend its capabilities while maintaining security through features like granular permissions. Ultimately, MCP aims to transform AI Agents from isolated chatbots into context-aware, interoperable systems deeply integrated into digital environments. Key elements from the Model Context Protocol: Standardization: MCP provides a standardized way for language models to interact with tools, promoting interoperability. Communication Methods: Supports multiple communication methods, including STDIO and SSE, for flexibility in tool integration. Tool Integration: Enables language models to use external tools, enhancing their functionality and applicability. How Does It Work? MCP operates on a client-server architecture: MCP Hosts: These are the AI applications or interfaces, such as IDEs, or AI tools, that seek to access data through MCP. They initiate requests for data or actions. MCP Clients: These are protocol clients that maintain a one-to-one connection with MCP servers, acting as intermediaries to forward requests and responses. MCP Servers: Lightweight programs that expose specific capabilities through the MCP, connecting to local or remote data sources. Examples include servers for file systems, databases, or APIs, each advertising their capabilities for hosts to utilize. Local Data Sources: These include the computer’s files, databases, and services that MCP servers can securely access, such as reading local documents or querying SQLite databases. Remote Services: External systems available over the internet, such as APIs, that MCP servers can connect to, enabling AI to interact with cloud-based tools or services. Implementation lets try to implement a MCP client using Azure OpenAI with Chainlit and openai python library. By end of this blog you can use attach any MCP server to your client and start using with a simple user interface. So lets get started. First thing we need to ensure is our MCP tools are listed and loaded to our chainlit session. As you install any MCP server , you need to ensure that all the tools of those associated MCP servers are added to your session. .on_chat_start async def start_chat(): client = ChatClient() cl.user_session.set("messages", []) cl.user_session.set("system_prompt", SYSTEM_PROMPT) @cl.on_mcp_connect async def on_mcp(connection, session: ClientSession): result = await session.list_tools() tools = [{ "name": t.name, "description": t.description, "parameters": t.inputSchema, } for t in result.tools] mcp_tools = cl.user_session.get("mcp_tools", {}) mcp_tools[connection.name] = tools cl.user_session.set("mcp_tools", mcp_tools) Next thing we need to do is that we have to flatten the tools as the same will be passed to Azure OpenAI. In this case for each message we pass the loaded MCP server session tools into chat session after flattening it. def flatten(xss): return [x for xs in xss for x in xs] @cl.on_message async def on_message(message: cl.Message): mcp_tools = cl.user_session.get("mcp_tools", {}) tools = flatten([tools for _, tools in mcp_tools.items()]) tools = [{"type": "function", "function": tool} for tool in tools] # Create a fresh client instance for each message client = ChatClient() # Restore conversation history client.messages = cl.user_session.get("messages", []) msg = cl.Message(content="") async for text in client.generate_response(human_input=message.content, tools=tools): await msg.stream_token(text) # Update the stored messages after processing cl.user_session.set("messages", client.messages) Next I define a tool calling step which basically call the MCP session to execute the tool. .step(type="tool") async def call_tool(mcp_name, function_name, function_args): try: print(f"Function Name: {function_name} Function Args: {function_args}") mcp_session, _ = cl.context.session.mcp_sessions.get(mcp_name) func_response = await mcp_session.call_tool(function_name, function_args) except Exception as e: traceback.print_exc() func_response = json.dumps({"error": str(e)}) return str(func_response.content) Next i define a chat client which basically can run as many tools in an iterative manner through for loop (No third party library), simple openai python client. import json from mcp import ClientSession import os import re from aiohttp import ClientSession import chainlit as cl from openai import AzureOpenAI, AsyncAzureOpenAI import traceback from dotenv import load_dotenv load_dotenv("azure.env") SYSTEM_PROMPT = "you are a helpful assistant." class ChatClient: def __init__(self) -> None: self.deployment_name = os.environ["AZURE_OPENAI_MODEL"] self.client = AsyncAzureOpenAI( azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2024-12-01-preview", ) self.messages = [] self.system_prompt = SYSTEM_PROMPT async def process_response_stream(self, response_stream, tools, temperature=0): """ Recursively process response streams to handle multiple sequential function calls. This function can call itself when a function call is completed to handle subsequent function calls. """ function_arguments = "" function_name = "" tool_call_id = "" is_collecting_function_args = False collected_messages = [] try: async for part in response_stream: if part.choices == []: continue delta = part.choices[0].delta finish_reason = part.choices[0].finish_reason # Process assistant content if delta.content: collected_messages.append(delta.content) yield delta.content # Handle tool calls if delta.tool_calls: if len(delta.tool_calls) > 0: tool_call = delta.tool_calls[0] # Get function name if tool_call.function.name: function_name = tool_call.function.name tool_call_id = tool_call.id # Process function arguments delta if tool_call.function.arguments: function_arguments += tool_call.function.arguments is_collecting_function_args = True # Check if we've reached the end of a tool call if finish_reason == "tool_calls" and is_collecting_function_args: # Process the current tool call print(f"function_name: {function_name} function_arguments: {function_arguments}") function_args = json.loads(function_arguments) mcp_tools = cl.user_session.get("mcp_tools", {}) mcp_name = None for connection_name, session_tools in mcp_tools.items(): if any(tool.get("name") == function_name for tool in session_tools): mcp_name = connection_name break reply_to_customer = function_args.get('reply_to_customer') print(f"reply_to_customer: {reply_to_customer}") # Output any replies to the customer if reply_to_customer: tokens = re.findall(r'\s+|\w+|[^\w\s]', reply_to_customer) for token in tokens: yield token # Add the assistant message with tool call self.messages.append({ "role": "assistant", "content": reply_to_customer, "tool_calls": [ { "id": tool_call_id, "function": { "name": function_name, "arguments": function_arguments }, "type": "function" } ] }) func_response = await call_tool(mcp_name, function_name, function_args) # Add the tool response self.messages.append({ "tool_call_id": tool_call_id, "role": "tool", "name": function_name, "content": func_response, }) # Create a new stream to continue processing new_response = await self.client.chat.completions.create( model=self.deployment_name, messages=self.messages, tools=tools, parallel_tool_calls=False, stream=True, temperature=temperature ) # Use a separate try block for recursive processing try: async for token in self.process_response_stream(new_response, tools, temperature): yield token except GeneratorExit: return return # Check if we've reached the end of assistant's response if finish_reason == "stop": # Add final assistant message if there's content if collected_messages: final_content = ''.join([msg for msg in collected_messages if msg is not None]) if final_content.strip(): self.messages.append({"role": "assistant", "content": final_content}) return except GeneratorExit: return except Exception as e: print(f"Error in process_response_stream: {e}") traceback.print_exc() # Main entry point that uses the recursive function async def generate_response(self, human_input, tools, temperature=0): print(f"human_input: {human_input}") self.messages.append({"role": "user", "content": human_input}) response_stream = await self.client.chat.completions.create( model=self.deployment_name, messages=self.messages, tools=tools, parallel_tool_calls=False, stream=True, temperature=temperature ) try: # Process the initial stream with our recursive function async for token in self.process_response_stream(response_stream, tools, temperature): yield token except GeneratorExit: return Conclusion The Model Context Protocol (MCP) is a pivotal development in AI integration, offering a standardized, open protocol that simplifies how AI models interact with external data and tools. Its client-server architecture, supported by JSON-RPC 2.0 and flexible transports, ensures efficient and secure communication, while its benefits of standardization, flexibility, security, efficiency, and scalability make it a valuable tool for developers. With diverse use cases like knowledge graph management, database queries, and API integrations, MCP is poised to unlock the full potential of AI applications, breaking down data silos and enhancing responsiveness. For those interested in exploring further, the rich documentation, SDKs, and community resources provide ample opportunities to engage with and contribute to this evolving standard. Here is the Githublink for end to end demo: Thanks Manoranjan Rajguru AI Global Black Belt, Asia https://www.linkedin.com/in/manoranjan-rajguru/11KViews7likes5CommentsEnterprise Application Development with Azure Responses API and Agents SDK
The blog Enterprise Application Development with Azure Responses API and Agents SDK outlines the integration of AI technologies in enterprise applications. It highlights the use of Microsoft's Azure Responses API and OpenAI's Agents SDK, AutoGen, Swarm, LangGraph, and LangMem to build intelligent AI agents. These agents enhance automation, decision-making, and customer engagement by understanding complex queries, executing multi-step operations, and maintaining context across interactions. The document also covers installation, architecture, development, integration, scalability, security, deployment, and real-world implementations.422Views0likes0CommentsUnleashing Innovation: AI Agent Development with Azure AI Foundry
Creating AI agents using Azure AI Foundry is a game-changer for businesses and developers looking to harness the power of artificial intelligence. These AI agents can automate complex tasks, provide insightful data analysis, and enhance customer interactions, leading to increased efficiency and productivity. By leveraging Azure AI Foundry, organizations can build, deploy, and manage AI solutions with ease, ensuring they stay competitive in an ever-evolving technological landscape. The importance of creating AI agents lies in their ability to transform operations, drive innovation, and deliver personalized experiences, making them an invaluable asset in today's digital age. Let's take a look at how to create an agent on Azure AI Foundry. We'll explore some of the features and experiment with its capabilities in the playground. I recommend by creating a new resource group with a new Azure OpenAI resource. Once the Azure OpenAI Resource is created, follow these steps to get started with Azure AI Foundry Agents. Implementation Overview Open Azure AI Foundry and click on the Azure AI Foundry link at the top right to get to the home page where you'll see all your projects. Click on + Create project then click on Create new hub Give it a name then click Next and Create New resources will be created with your new project. Once inside your new project you should see the Agents preview option on the left menu Select your Azure OpenAI Service resource and click Let's go We can now get started with implementation. A model needs to be deployed. However, it's important to consider which models can be used and their regions for creating these agents. Below is a quick summary of what's currently available. Supported models in Azure AI Agent Service - Azure AI services | Microsoft Learn Other models supported include Meta-Llama-405B-Instruct, Mistral-large-2407, Cohere-command-r-plus, and Cohere-command-r. I've deployed gpt-4 as Global Standard and can now create a new agent. Click on +New agent. A new Agent will be created and details such as the agent instructions, model deployment, Knowledge and Action configurations, and model settings are shown. Incorporating knowledge into AI agents is to enhance their ability to provide accurate, relevant, and context-specific responses. This makes them more effective in automating tasks, answering complex queries, and supporting decision-making processes. Actions enable AI agents to perform specific tasks and interact with various service and data sources. Here we can leverage these abilities by adding a Custom Function, OpenAPI 3.0 specified tool, or an Azure function to help run tasks. The Code Interpreter feature within Actions empowers the agent to read and analyze datasets, generate code, and create visualizations such as graphs and charts. In the next section we'll go deeper with code interpreters' abilities. Code Interpreter For this next step I'll leverage weatherHistory.csv file from Weather Dataset for code interpreter to perform on. Next Actions click on + Add then click on Code interpreter and add the csv file. Update the Instructions to "You are a Weather Data Expert Agent, designed to provide accurate, up-to-date, and detailed weather information." Lets explore what Code interpreter can do. Click on Try in playground on the top right. I'll start by asking "can you tell me which month had the most rain?", code interpreter already knows that I'm asking a question in reference to the data file I just gave it and will breakdown the question into multiple steps to provide the best possible answer. We can see that based on the dataset, August 2010 has the most where 768 instances of rainfall were recorded. We'll take it a step further and create a graph using a different question. Let's ask the agent "ok, can you create a bar chart that shows the amount of rain fall from each year using the provided dataset?" in which the agent will respond with the following: This is just a quick demonstration of how powerful code interpreter can be. Code interpreter allows for efficient data interpretation and presentation as shown above, making it easier to derive insights and make informed decisions. We'll create and add a Bing Grounding Resource which will allow an agent to include real-time public web data into their responses. Bing Grounding Resource A Bing Grounding Resource is a powerful tool that enables AI agents to access and incorporate real-time data from the web into their responses and also ensures that the information provided by the agents is accurate, current, and relevant. An agent will be able to perform Bing searches when needed, fetching up-to-date information and enhancing the overall reliability and transparency of its responses. By leveraging Bing Grounding, AI agents can deliver more precise and contextually appropriate answers, significantly improving user satisfaction and trust. To add a Bing Ground Resource to the agent: Create the Resource: Navigate to the Azure AI Foundry portal and create a new Bing Grounding resource. Add Knowledge: Go to your agent in Azure AI Foundry, click on + Add next to Knowledge on the right side, select Grounding with Big Search, + Create connection. Add connection with API key. The Bing Grounding resource is now added to your agent. In the playground I'll add first ask "Is it raining over downtown New York today?". I will get a live response that also includes the links to the sources where the information was retrieved from. The agent responds as shown below: Next i'll ask the agent "How's should I prepare for the weather in New York this week? Any clothing recommendations?" in which the agent responds: The agent is able to breakdown the question using gpt-4 in detail by leveraging the source information from Bing and providing appropriate information to the user. Other the capabilities of custom functions, OpenAPI 3.0 specified tools, and Azure Functions significantly enhance the versatility and power of Azure AI agents. Custom functions allow agents to perform specialized tasks tailored to specific business needs, while OpenAPI 3.0 specified tools enable seamless integration with a wide range of external services and APIs. Azure Functions further extend the agent's capabilities by allowing it to execute serverless code, automating complex workflows and processes. Together, these features empower developers to build highly functional and adaptable AI agents that can efficiently handle diverse tasks, drive innovation, and deliver exceptional value to users. Conclusion Developing an AI Agent on Azure AI Foundry is a swift and efficient process, thanks to its robust features and comprehensive tools. The platform's Bing Grounding Resource ensures that your AI models are well-informed and contextually accurate, leveraging vast amounts real-time of data to enhance performance. Additionally, the Code Interpreter simplifies the integration and execution of solving complex tasks involving data analysis. By utilizing these powerful resources, you can accelerate the development of intelligent agents that are not only capable of understanding and responding to user inputs but also continuously improving through iterative learning. Azure AI Foundry provides a solid foundation for creating innovative AI solutions that can drive significant value across various applications. Additional Resources: Quickstart - Create a new Azure AI Agent Service project - Azure AI services | Microsoft Learn How to use Grounding with Bing Search in Azure AI Agent Service - Azure OpenAI | Microsoft Learn813Views0likes0CommentsPrompt Engineering for OpenAI’s O1 and O3-mini Reasoning Models
Important Attempting to extract the model's internal reasoning is prohibited, as it violates the acceptable use guidelines. This section explores how O1 and O3-mini differ from GPT-4o in input handling, reasoning capabilities, and response behavior, and outlines prompt engineering best practices to maximize their performance. Finally, we apply these best practices to a legal case analysis scenario. Differences Between O1/O3-mini and GPT-4o Input Structure and Context Handling Built-in Reasoning vs. Prompted Reasoning: O1-series models have built-in chain-of-thought reasoning, meaning they internally reason through steps without needing explicit coaxing from the prompt. In contrast, GPT-4o often benefits from external instructions like “Let’s think step by step” to solve complex problems, since it doesn’t automatically engage in multi-step reasoning to the same extent. With O1/O3, you can present the problem directly; the model will analyze it deeply on its own. Need for External Information: GPT-4o has a broad knowledge base and access to tools (e.g. browsing, plugins, vision) in certain deployments, which helps it handle a wide range of topics. By comparison, the O1 models have a narrower knowledge base outside their training focus. For example, O1-preview excelled at reasoning tasks but couldn’t answer questions about itself due to limited knowledge context. This means when using O1/O3-mini, important background information or context should be included in the prompt if the task is outside common knowledge – do not assume the model knows niche facts. GPT-4o might already know a legal precedent or obscure detail, whereas O1 might require you to provide that text or data. Context Length: The reasoning models come with very large context windows. O1 supports up to 128k tokens of input, and O3-mini accepts up to 200k tokens (with up to 100k tokens output), exceeding GPT-4o’s context length. This allows you to feed extensive case files or datasets directly into O1/O3. For prompt engineering, structure large inputs clearly (use sections, bullet points, or headings) so the model can navigate the information. Both GPT-4o and O1 can handle long prompts, but O1/O3’s higher capacity means you can include more detailed context in one go, which is useful in complex analyses. Reasoning Capabilities and Logical Deduction Depth of Reasoning: O1 and O3-mini are optimized for methodical, multi-step reasoning. They literally “think longer” before answering, which yields more accurate solutions on complex tasks. For instance, O1-preview solved 83% of problems on a challenging math exam (AIME), compared to GPT-4o’s 13% – a testament to its superior logical deduction in specialized domains. These models internally perform chain-of-thought and even self-check their work. GPT-4o is also strong but tends to produce answers more directly; without explicit prompting, it might not analyze as exhaustively, leading to errors in very complex cases that O1 could catch. Handling of Complex vs. Simple Tasks: Because O1-series models default to heavy reasoning, they truly shine on complex problems that have many reasoning steps (e.g. multi-faceted analyses, long proofs). In fact, on tasks requiring five or more reasoning steps, a reasoning model like O1-mini or O3 outperforms GPT-4 by a significant margin (16%+ higher accuracy). However, this also means that for very simple queries, O1 may “overthink.” Research found that on straightforward tasks (fewer than 3 reasoning steps), O1’s extra analytical process can become a disadvantage – it underperformed GPT-4 in a significant portion of such cases due to excessive reasoning. GPT-4o might answer a simple question more directly and swiftly, whereas O1 might generate unnecessary analysis. The key difference is O1 is calibrated for complexity, so it may be less efficient for trivial Q&A. Logical Deduction Style: When it comes to puzzles, deductive reasoning, or step-by-step problems, GPT-4o usually requires prompt engineering to go stepwise (otherwise it might jump to an answer). O1/O3 handle logical deduction differently: they simulate an internal dialogue or scratchpad. For the user, this means O1’s final answers tend to be well-justified and less prone to logical gaps. It will have effectively done a “chain-of-thought” internally to double-check consistency. From a prompt perspective, you generally don’t need to tell O1 to explain or check its logic – it does so automatically before presenting the answer. With GPT-4o, you might include instructions like “first list the assumptions, then conclude” to ensure rigorous logic; with O1, such instructions are often redundant or even counterproductive. Response Characteristics and Output Optimization Detail and Verbosity: Because of their intensive reasoning, O1 and O3-mini often produce detailed, structured answers for complex queries. For example, O1 might break down a math solution into multiple steps or provide a rationale for each part of a strategy plan. GPT-4o, on the other hand, may give a more concise answer by default or a high-level summary, unless prompted to elaborate. In terms of prompt engineering, this means O1’s responses might be longer or more technical. You have more control over this verbosity through instructions. If you want O1 to be concise, you must explicitly tell it (just as you would GPT-4) – otherwise, it might err on the side of thoroughness. Conversely, if you want a step-by-step explanation in the output, GPT-4o might need to be told to include one, whereas O1 will happily provide one if asked (and has likely done the reasoning internally regardless). Accuracy and Self-Checking: The reasoning models exhibit a form of self-fact-checking. OpenAI notes that O1 is better at catching its mistakes during the response generation, leading to improved factual accuracy in complex responses. GPT-4o is generally accurate, but it can occasionally be confidently wrong or hallucinate facts if not guided. O1’s architecture reduces this risk by verifying details as it “thinks.” In practice, users have observed that O1 produces fewer incorrect or nonsensical answers on tricky problems, whereas GPT-4o might require prompt techniques (like asking it to critique or verify its answer) to reach the same level of confidence. This means you can often trust O1/O3 to get complex questions right with a straightforward prompt, whereas with GPT-4 you might add instructions like “check your answer for consistency with the facts above.” Still, neither model is infallible, so critical factual outputs should always be reviewed. Speed and Cost: A notable difference is that O1 models are slower and more expensive in exchange for their deeper reasoning. O1 Pro even includes a progress bar for long queries. GPT-4o tends to respond faster for typical queries. O3-mini was introduced to offer a faster, cost-efficient reasoning model – it’s much cheaper per token than O1 or GPT-4o and has lower latency. However, O3-mini is a smaller model, so while it’s strong in STEM reasoning, it might not match full O1 or GPT-4 in general knowledge or extremely complex reasoning. When prompt engineering for optimal response performance, you need to balance depth vs. speed: O1 might take longer to answer thoroughly. If latency is a concern and the task isn’t maximal complexity, O3-mini (or even GPT-4o) could be a better choice. OpenAI’s guidance is that GPT-4o “is still the best option for most prompts,” using O1 primarily for truly hard problems in domains like strategy, math, and coding. In short, use the right tool for the job – and if you use O1, anticipate longer responses and plan for its slower output (possibly by informing the user or adjusting system timeouts). Prompt Engineering Techniques to Maximize Performance Leveraging O1 and O3-mini effectively requires a slightly different prompting approach than GPT-4o. Below are key prompt engineering techniques and best practices to get the best results from these reasoning models: Keep Prompts Clear and Minimal Be concise and direct with your ask. Because O1 and O3 perform intensive internal reasoning, they respond best to focused questions or instructions without extraneous text. OpenAI and recent research suggest avoiding overly complex or leading prompts for these models. In practice, this means you should state the problem or task plainly and provide only necessary details. There is no need to add “fluff” or multiple rephrasing of the query. For example, instead of writing: “In this challenging puzzle, I’d like you to carefully reason through each step to reach the correct solution. Let’s break it down step by step...”, simply ask: “Solve the following puzzle [include puzzle details]. Explain your reasoning.” The model will naturally do the step-by-step thinking internally and give an explanation. Excess instructions can actually overcomplicate things – one study found that adding too much prompt context or too many examples worsened O1’s performance, essentially overwhelming its reasoning process. Tip: For complex tasks, start with a zero-shot prompt (just the task description) and only add more instruction if you find the output isn’t meeting your needs. Often, minimal prompts yield the best results with these reasoning models. Avoid Unnecessary Few-Shot Examples Traditional prompt engineering for GPT-3/4 often uses few-shot examples or demonstrations to guide the model. With O1/O3, however, less is more. The O1 series was explicitly trained to not require example-laden prompts. In fact, using multiple examples can hurt performance. Research on O1-preview and O1-mini showed that few-shot prompting consistently degraded their performance – even carefully chosen examples made them do worse than a simple prompt in many cases. The internal reasoning seems to get distracted or constrained by the examples. OpenAI’s own guidance aligns with this: they recommend limiting additional context or examples for reasoning models to avoid confusing their internal logic. Best practice: use zero-shot or at most one example if absolutely needed. If you include an example, make it highly relevant and simple. For instance, in a legal analysis prompt, you generally would not prepend a full example case analysis; instead, just ask directly about the new case. The only time you might use a demonstration is if the task format is very specific and the model isn’t following instructions – then show one brief example of the desired format. Otherwise, trust the model to figure it out from a direct query. Leverage System/Developer Instructions for Role and Format Setting a clear instructional context can help steer the model’s responses. With the API (or within a conversation’s system message), define the model’s role or style succinctly. For example, a system message might say: “You are an expert scientific researcher who explains solutions step-by-step”. O1 and O3-mini respond well to such role instructions and will incorporate them in their reasoning. However, remember that they already excel at understanding complex tasks, so your instructions should focus on what kind of output you want, not how to think. Good uses of system/developer instructions include: Defining the task scope or persona: e.g. “Act as a legal analyst” or “Solve the problem as a math teacher explaining to a student.” This can influence tone and the level of detail. Specifying the output format: If you need the answer in a structured form (bullet points, a table, JSON, etc.), explicitly say so. O1 and especially O3-mini support structured output modes and will adhere to format requests. For instance: “Provide your findings as a list of key bullet points.” Given their logical nature, they tend to follow format instructions accurately, which helps maintain consistency in responses Setting boundaries: If you want to control verbosity or focus, you can include something like “Provide a brief conclusion after the detailed analysis” or “Only use the information given without outside assumptions.” The reasoning models will respect these boundaries, and it can prevent them from going on tangents or hallucinating facts. This is important since O1 might otherwise produce a very exhaustive analysis – which is often great, but not if you explicitly need just a summary. Ensure any guidance around tone, role, format is included each time. Control Verbosity and Depth Through Instructions While O1 and O3-mini will naturally engage in deep reasoning, you have control over how much of that reasoning is reflected in the output. If you want a detailed explanation, prompt for it (e.g. “Show your step-by-step reasoning in the answer”). They won’t need the nudge to do the reasoning, but they do need to be told if you want to see it. Conversely, if you find the model’s answers too verbose or technical for your purposes, instruct it to be more concise or to focus only on certain aspects. For example: “In 2-3 paragraphs, summarize the analysis with only the most critical points.” The models are generally obedient to such instructions about length or focus. Keep in mind that O1’s default behavior is to be thorough – it’s optimized for correctness over brevity – so it may err on the side of giving more details. A direct request for brevity will override this tendency in most cases. For O3-mini, OpenAI provides an additional tool to manage depth: the “reasoning effort” parameter (low, medium, high). This setting lets the model know how hard to “think.” In prompt terms, if using the API or a system that exposes this feature, you can dial it up for very complex tasks (ensuring maximum reasoning, at the cost of longer answers and latency) or dial it down for simpler tasks (faster, more streamlined answers). This is essentially another way to control verbosity and thoroughness. If you don’t have direct access to that parameter, you can mimic a low effort mode by explicitly saying “Give a quick answer without deep analysis” for cases where speed matters more than perfect accuracy. Conversely, to mimic high effort, you might say “Take all necessary steps to arrive at a correct answer, even if the explanation is long.” These cues align with how the model’s internal setting would operate. Ensure Accuracy in Complex Tasks To get the most accurate responses on difficult problems, take advantage of the reasoning model’s strengths in your prompt. Since O1 can self-check and even catch contradictions, you can ask it to utilize that: e.g. “Analyze all the facts and double-check your conclusion for consistency.” Often it will do so unprompted, but reinforcing that instruction can signal the model to be extra careful. Interestingly, because O1 already self-fact-checks, you rarely need to prompt it with something like “verify each step” (that’s more helpful for GPT-4o). Instead, focus on providing complete and unambiguous information. If the question or task has potential ambiguities, clarify them in the prompt or instruct the model to list any assumptions. This prevents the model from guessing wrongly. Handling sources and data: If your task involves analyzing given data (like summarizing a document or computing an answer from provided numbers), make sure that data is clearly presented. O1/O3 will diligently use it. You can even break data into bullet points or a table for clarity. If the model must not hallucinate (say, in a legal context it shouldn’t make up laws), explicitly state “base your answer only on the information provided and common knowledge; do not fabricate any details.” The reasoning models are generally good at sticking to known facts, and such an instruction further reduces the chance of hallucinationIterate and verify: If the task is critical (for example, complex legal reasoning or a high-stakes engineering calculation), a prompt engineering technique is to ensemble the model’s responses. This isn’t a single prompt, but a strategy: you could run the query multiple times (or ask the model to consider alternative solutions) and then compare answers. O1’s stochastic nature means it might explore different reasoning paths each time. By comparing outputs or asking the model to “reflect if there are alternative interpretations” in a follow-up prompt, you can increase confidence in the result. While GPT-4o also benefits from this approach, it’s especially useful for O1 when absolute accuracy is paramount – essentially leveraging the model’s own depth by cross-verifying. Finally, remember that model selection is part of prompt engineering: If a question doesn’t actually require O1-level reasoning, using GPT-4o might be more efficient and just as accurate. OpenAI recommends saving O1 for the hard cases and using GPT-4o for the rest. So a meta-tip: assess task complexity first. If it’s simple, either prompt O1 very straightforwardly to avoid overthinking, or switch to GPT-4o. If it’s complex, lean into O1’s abilities with the techniques above. How O1/O3 Handle Logical Deduction vs. GPT-4o The way these reasoning models approach logical problems differs fundamentally from GPT-4o, and your prompt strategy should adapt accordingly: Handling Ambiguities: In logical deduction tasks, if there’s missing info or ambiguity, GPT-4o might make an assumption on the fly. O1 is more likely to flag the ambiguity or consider multiple possibilities because of its reflective approach. To leverage this, your prompt to O1 can directly ask: “If there are any uncertainties, state your assumptions before solving.” GPT-4 might need that nudge more. O1 might do it naturally or at least is less prone to assuming facts not given. So in comparing the two, O1’s deduction is cautious and thorough, whereas GPT-4o’s is swift and broad. Tailor your prompt accordingly – with GPT-4o, guide it to be careful; with O1, you mainly need to supply the information and let it do its thing. Step-by-Step Outputs: Sometimes you actually want the logical steps in the output (for teaching or transparency). With GPT-4o, you must explicitly request this (“please show your work”). O1 might include a structured rationale by default if the question is complex enough, but often it will present a well-reasoned answer without explicitly enumerating every step unless asked. If you want O1 to output the chain of logic, simply instruct it to — it will have no trouble doing so. In fact, O1-mini was noted to be capable of providing stepwise breakdowns (e.g., in coding problems) when prompted. Meanwhile, if you don’t want a long logical exposition from O1 (maybe you just want the final answer), you should say “Give the final answer directly” to skip the verbose explanation. Logical Rigor vs. Creativity: One more difference: GPT-4 (and 4o) has a streak of creativity and generative strength. Sometimes in logic problems, this can lead it to “imagine” scenarios or analogies, which isn’t always desired. O1 is more rigor-focused and will stick to logical analysis. If your prompt involves a scenario requiring both deduction and a bit of creativity (say, solving a mystery by piecing clues and adding a narrative), GPT-4 might handle the narrative better, while O1 will strictly focus on deduction. In prompt engineering, you might combine their strengths: use O1 to get the logical solution, then use GPT-4 to polish the presentation. If sticking to O1/O3 only, be aware that you might need to explicitly ask it for creative flourishes or more imaginative responses – they will prioritize logic and correctness by design. Key adjustment: In summary, to leverage O1/O3’s logical strengths, give them the toughest reasoning tasks as a single well-defined prompt. Let them internally grind through the logic (they’re built for it) without micromanaging their thought process. For GPT-4o, continue using classic prompt engineering (decompose the problem, ask for step-by-step reasoning, etc.) to coax out the same level of deduction. And always match the prompt style to the model – what confuses GPT-4o might be just right for O1, and vice versa, due to their different reasoning approaches. Crafting Effective Prompts: Best Practices Summary To consolidate the above into actionable guidelines, here’s a checklist of best practices when prompting O1 or O3-mini: Use Clear, Specific Instructions: Clearly state what you want the model to do or answer. Avoid irrelevant details. For complex questions, a straightforward ask often suffices (no need for elaborate role-play or multi-question prompts). Provide Necessary Context, Omit the Rest: Include any domain information the model will need (facts of a case, data for a math problem, etc.), since the model might not have up-to-date or niche knowledge. But don’t overload the prompt with unrelated text or too many examples – extra fluff can dilute the model’s focus Minimal or No Few-Shot Examples: By default, start with zero-shot prompts. If the model misinterprets the task or format, you can add one simple example as guidance, but never add long chains of examples for O1/O3. They don’t need it, and it can even degrade performance. Set the Role or Tone if Needed: Use a system message or a brief prefix to put the model in the right mindset (e.g. “You are a senior law clerk analyzing a case.”). This helps especially with tone (formal vs. casual) and ensures domain-appropriate language. Specify Output Format: If you expect the answer in a particular structure (list, outline, JSON, etc.), tell the model explicitly. The reasoning models will follow format instructions reliably. For instance: “Give your answer as an ordered list of steps.” Control Length and Detail via Instructions: If you want a brief answer, say so (“answer in one paragraph” or “just give a yes/no with one sentence explanation”). If you want an in-depth analysis, encourage it (“provide a detailed explanation”). Don’t assume the model knows your desired level of detail by default – instruct it. Leverage O3-mini’s Reasoning Effort Setting: When using O3-mini via API, choose the appropriate reasoning effort (low/medium/high) for the task. High gives more thorough answers (good for complex legal reasoning or tough math), low gives faster, shorter answers (good for quick checks or simpler queries). This is a unique way to tune the prompt behavior for O3-mini. Avoid Redundant “Think Step-by-Step” Prompts: Do not add phrases like “let’s think this through” or chain-of-thought directives for O1/O3; the model already does this internally. Save those tokens and only use such prompts on GPT-4o, where they have impact. Test and Iterate: Because these models can be sensitive to phrasing, if you don’t get a good answer, try rephrasing the question or tightening the instructions. You might find that a slight change (e.g. asking a direct question vs. an open-ended prompt) yields a significantly better response. Fortunately, O1/O3’s need for iteration is less than older models (they usually get complex tasks right in one go), but prompt tweaking can still help optimize clarity or format. Validate Important Outputs: For critical use-cases, don’t rely on a single prompt-answer cycle. Use follow-up prompts to ask the model to verify or justify its answer (“Are you confident in that conclusion? Explain why.”), or run the prompt again to see if you get consistent results. Consistency and well-justified answers indicate the model’s reasoning is solid. By following these techniques, you can harness O1 and O3-mini’s full capabilities and get highly optimized responses that play to their strengths. Applying Best Practices to a Legal Case Analysis Finally, let’s consider how these prompt engineering guidelines translate to a legal case analysis scenario (as mentioned earlier). Legal analysis is a perfect example of a complex reasoning task where O1 can be very effective, provided we craft the prompt well: Structure the Input: Start by clearly outlining the key facts of the case and the legal questions to be answered. For example, list the background facts as bullet points or a brief paragraph, then explicitly ask the legal question: “Given the above facts, determine whether Party A is liable for breach of contract under U.S. law.” Structuring the prompt this way makes it easier for the model to parse the scenario. It also ensures no crucial detail is buried or overlooked. Provide Relevant Context or Law: If specific statutes, case precedents, or definitions are relevant, include them (or summaries of them) in the prompt. O1 doesn’t have browsing and might not recall a niche law from memory, so if your analysis hinges on, say, the text of a particular law, give it to the model. For instance: “According to [Statute X excerpt], [provide text]… Apply this statute to the case.” This way, the model has the necessary tools to reason accurately. Set the Role in the System Message: A system instruction like “You are a legal analyst who explains the application of law to facts in a clear, step-by-step manner.” will cue the model to produce a formal, reasoned analysis. While O1 will already attempt careful reasoning, this instruction aligns its tone and structure with what we expect in legal discourse (e.g. citing facts, applying law, drawing conclusions). No Need for Multiple Examples: Don’t supply a full example case analysis as a prompt (which you might consider doing with GPT-4o). O1 doesn’t need an example to follow – it can perform the analysis from scratch.. You might, however, briefly mention the desired format: “Provide your answer in an IRAC format (Issue, Rule, Analysis, Conclusion).” This format instruction gives a template without having to show a lengthy sample, and O1 will organize the output accordingly. Control Verbosity as Needed: If you want a thorough analysis of the case, let O1 output its comprehensive reasoning. The result may be several paragraphs covering each issue in depth. If you find the output too verbose or if you specifically need a succinct brief (for example, a quick advisory opinion), instruct the model: “Keep the analysis to a few key paragraphs focusing on the core issue.” This ensures you get just the main points. On the other hand, if the initial answer seems too brief or superficial, you can prompt again: “Explain in more detail, especially how you applied the law to the facts.” O1 will gladly elaborate because it has already done the heavy reasoning internally. Accuracy and Logical Consistency: Legal analysis demands accuracy in applying rules to facts. With O1, you can trust it to logically work through the problem, but it’s wise to double-check any legal citations or specific claims it makes (since its training data might not have every detail). You can even add a prompt at the end like, “Double-check that all facts have been addressed and that the conclusion follows the law.” Given O1’s self-checking tendency, it may itself point out if something doesn’t add up or if additional assumptions were needed. This is a useful safety net in a domain where subtle distinctions matter. Use Follow-Up Queries: In a legal scenario, it’s common to have follow-up questions. For instance, if O1 gives an analysis, you might ask, “What if the contract had a different clause about termination? How would that change the analysis?” O1 can handle these iterative questions well, carrying over its reasoning. Just remember that, if the project you ar working on, the interface doesn’t have long-term memory beyond the current conversation context (and no browsing), each follow-up should either rely on the context provided or include any new information needed. Keep the conversation focused on the case facts at hand to prevent confusion. By applying these best practices, your prompts will guide O1 or O3-mini to deliver high-quality legal analysis. In summary, clearly present the case, specify the task, and let the reasoning model do the heavy lifting. The result should be a well-reasoned, step-by-step legal discussion that leverages O1’s logical prowess, all optimized through effective prompt construction. Using OpenAI’s reasoning models in this way allows you to tap into their strength in complex problem-solving while maintaining control over the style and clarity of the output. As OpenAI’s own documentation notes, the O1 series excels at deep reasoning tasks in domains like research and strategy– legal analysis similarly benefits from this capability. By understanding the differences from GPT-4o and adjusting your prompt approach accordingly, you can maximize the performance of O1 and O3-mini and obtain accurate, well-structured answers even for the most challenging reasoning tasks.20KViews6likes4CommentsLearn about Azure AI during the Global AI Bootcamp 2025
The Global AI Bootcamp starting next week, and it’s more exciting than ever! With 135 bootcamps in 44 countries, this is your chance to be part of a global movement in AI innovation. 🤖🌍 From Germany to India, Nigeria to Canada, and beyond, join us for hands-on workshops, expert talks, and networking opportunities that will boost your AI skills and career. Whether you’re a seasoned pro or just starting out, there’s something for everyone! 🚀 Why Attend? 🛠️ Hands-on Workshops: Build and deploy AI models. 🎤 Expert Talks: Learn the latest trends from industry leaders. 🤝 Network: Connect with peers, mentors, and potential collaborators. 📈 Career Growth: Discover new career paths in AI. Don't miss this incredible opportunity to learn, connect, and grow! Check out the event in your city or join virtually. Let's shape the future of AI together! 🌟 👉 Explore All Bootcamps465Views0likes0CommentsBuilt-in Enterprise Readiness with Azure AI Agent Service
Ensure enterprise-grade security and compliance with Private Network Isolation (BYO VNet) in Azure AI Agent Service. This feature allows AI agents to operate within a private, isolated network, giving organizations full control over data and networking configurations. Learn how Private Network Isolation enhances security, scalability, and compliance for mission-critical AI workloads.1.8KViews2likes0CommentsAnnouncing Provisioned Deployment for Azure OpenAI Service Fine-tuning
You've fine-tuned your models to make your agents behave and speak how you'd like. You've scaled up your RAG application to meet customer demand. You've now got a good problem: users love the service but want it snappier and more responsive. Azure OpenAI Service now offers provisioned deployments for fine-tuned models, giving your applications predictable performance with predictable costs! 💡 What is Provisioned Throughput? If you're unfamiliar with Provisioned Throughput, it allows Azure OpenAI Service customers to purchase capacity in terms of performance needs instead of per-token. With fine-tuned deployments, it replaces both the hosting fee and the token-based billing of Standard and Global Standard (now in Public Preview) with a throughput-based capacity unit called provisioned through units (PTU). Every PTU corresponds to a commitment of both latency and throughput in Tokens per Minute (TPM). This differs from Standard and Global Standard which only provide availability guarantees and best-effort performance. With fine-tuned deployments, it replaces both the hosting fee and the token-based billing of Standard and Global Standard (now in Public Preview) with a throughput-based capacity unit called a PTU. 🤔 Is this the same PTU I'm already using? You might already be using Provisioned Throughput Units with base models and with fine-tuned models they work the same way. In fact, they're completely interchangeable! Already have quota in North Central US for 800 PTU and an annual Azure reservation rate? PTUs are interchangeable and model independent meaning you can get started with using them for fine-tuning immediately without any additional steps. Just select Provisioned Managed (Public Preview) from the model deployment dialog and set your PTU allotment. 📋 What's available in Public Preview? We're offering provisioned deployment in two regions for both gpt-4o (2024-08-06) and gpt-4o-mini (2024-07-18) to support Azure OpenAI Service customers: North Central US Switzerland West If your workload requires regions other than the above, please make sure to submit a request so we can consider it for General Availability. 🙏 🚀 How do I get started? If you don't already have PTU quota from base models, the easiest way to get started and shifting your fine-tuned deployments to provisioned is: Understand your workload needs. Is it spiky but with a baseline demand? Review some of our previous materials on right-sizing PTUs (or have CoPilot summarize it for you 😆). Estimate the PTUs you need for your workload by using the calculator. Increase your regional PTU quota, if required. Deploy your fine-tuned models to secure your Provisioned Throughput capacity. Make sure to purchase an Azure Reservation to cover your PTU usage to save big. Have a spiky workload? Combine PTU and Standard/Global Standard and configure your architecture for spillover. Have feedback as you continue on your PTU journey with Azure OpenAI Service? Let us know how we can make it better!871Views0likes0Comments