azure
335 TopicsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.100Views0likes0CommentsStep-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.4.3KViews1like2CommentsBuild faster with this simple AZD template for FastAPI on Azure App Service
The world of DevOps and deployment configuration options just keeps growing, but you just want to build your web app and get it out to the world! I created this Simple FastAPI AZD template for Azure App Service to help you get to the fun part while deploying it for free. Learn why and how to give it a go!161Views0likes0CommentsSmart Auditing: Leveraging Azure AI Agents to Transform Financial Oversight
In today's data-driven business environment, audit teams often spend weeks poring over logs and databases to verify spending and billing information. This time-consuming process is ripe for automation. But is there a way to implement AI solutions without getting lost in complex technical frameworks? While tools like LangChain, Semantic Kernel, and AutoGen offer powerful AI agent capabilities, sometimes you need a straightforward solution that just works. So, what's the answer for teams seeking simplicity without sacrificing effectiveness? This tutorial will show you how to use Azure AI Agent Service to build an AI agent that can directly access your Postgres database to streamline audit workflows. No complex chains or graphs required, just a practical solution to get your audit process automated quickly. The Auditing Challenge: It's the month end, and your audit team is drowning in spreadsheets. As auditors reviewing financial data across multiple SaaS tenants, you're tasked with verifying billing accuracy by tracking usage metrics like API calls, storage consumption, and user sessions in Postgres databases. Each tenant generates thousands of transactions daily, and traditionally, this verification process consumes weeks of your team's valuable time. Typically, teams spend weeks: Manually extracting data from multiple database tables. Cross-referencing usage with invoices. Investigating anomalies through tedious log analysis. Compiling findings into comprehensive reports. With an AI-powered audit agent, you can automate these tasks and transform the process. Your AI assistant can: Pull relevant usage data directly from your database Identify billing anomalies like unexpected usage spikes Generate natural language explanations of findings Create audit reports that highlight key concerns For example, when reviewing a tenant's invoice, your audit agent can query the database for relevant usage patterns, summarize anomalies, and offer explanations: "Tenant_456 experienced a 145% increase in API usage on April 30th, which explains the billing increase. This spike falls outside normal usage patterns and warrants further investigation." Let’s build an AI agent that connects to your Postgres database and transforms your audit process from manual effort to automated intelligence. Prerequisites: Before we start building our audit agent, you'll need: An Azure subscription (Create one for free). The Azure AI Developer RBAC role assigned to your account. Python 3.11.x installed on your development machine. OR You can also use GitHub Codespaces, which will automatically install all dependencies for you. You’ll need to create a GitHub account first if you don’t already have one. Setting Up Your Database: For this tutorial, we'll use Neon Serverless Postgres as our database. It's a fully managed, cloud-native Postgres solution that's free to start, scales automatically, and works excellently for AI agents that need to query data on demand. Creating a Neon Database on Azure: Open the Neon Resource page on the Azure portal Fill out the form with the required fields and deploy your database After creation, navigate to the Neon Serverless Postgres Organization service Click on the Portal URL to access the Neon Console Click "New Project" Choose an Azure region Name your project (e.g., "Audit Agent Database") Click "Create Project" Once your project is successfully created, copy the Neon connection string from the Connection Details widget on the Neon Dashboard. It will look like this: postgresql://[user]:[password]@[neon_hostname]/[dbname]?sslmode=require Note: Keep this connection string saved; we'll need it shortly. Creating an AI Foundry Project on Azure: Next, we'll set up the AI infrastructure to power our audit agent: Create a new hub and project in the Azure AI Foundry portal by following the guide. Deploy a model like GPT-4o to use with your agent. Make note of your Project connection string and Model Deployment name. You can find your connection string in the overview section of your project in the Azure AI Foundry portal, under Project details > Project connection string. Once you have all three values on hand: Neon connection string, Project connection string, and Model Deployment Name, you are ready to set up the Python project to create an Agent. All the code and sample data are available in this GitHub repository. You can clone or download the project. Project Environment Setup: Create a .env file with your credentials: PROJECT_CONNECTION_STRING="<Your AI Foundry connection string> "AZURE_OPENAI_DEPLOYMENT_NAME="gpt4o" NEON_DB_CONNECTION_STRING="<Your Neon connection string>" Create and activate a virtual environment: python -m venv .venv source .venv/bin/activate # on macOS/Linux .venv\Scripts\activate # on Windows Install required Python libraries: pip install -r requirements.txt Example requirements.txt: Pandas python-dotenv sqlalchemy psycopg2-binary azure-ai-projects ==1.0.0b7 azure-identity Load Sample Billing Usage Data: We will use a mock dataset for tenant usage, including computed percent change in API calls and storage usage in GB: tenant_id date api_calls storage_gb tenant_456 2025-04-01 1000 25.0 tenant_456 2025-03-31 950 24.8 tenant_456 2025-03-30 2200 26.0 Run python load_usage_data.py Python script to create and populate the usage_data table in your Neon Serverless Postgres instance: # load_usage_data.py file import os from dotenv import load_dotenv from sqlalchemy import ( create_engine, MetaData, Table, Column, String, Date, Integer, Numeric, ) # Load environment variables from .env load_dotenv() # Load connection string from environment variable NEON_DB_URL = os.getenv("NEON_DB_CONNECTION_STRING") engine = create_engine(NEON_DB_URL) # Define metadata and table schema metadata = MetaData() usage_data = Table( "usage_data", metadata, Column("tenant_id", String, primary_key=True), Column("date", Date, primary_key=True), Column("api_calls", Integer), Column("storage_gb", Numeric), ) # Create table with engine.begin() as conn: metadata.create_all(conn) # Insert mock data conn.execute( usage_data.insert(), [ { "tenant_id": "tenant_456", "date": "2025-03-27", "api_calls": 870, "storage_gb": 23.9, }, { "tenant_id": "tenant_456", "date": "2025-03-28", "api_calls": 880, "storage_gb": 24.0, }, { "tenant_id": "tenant_456", "date": "2025-03-29", "api_calls": 900, "storage_gb": 24.5, }, { "tenant_id": "tenant_456", "date": "2025-03-30", "api_calls": 2200, "storage_gb": 26.0, }, { "tenant_id": "tenant_456", "date": "2025-03-31", "api_calls": 950, "storage_gb": 24.8, }, { "tenant_id": "tenant_456", "date": "2025-04-01", "api_calls": 1000, "storage_gb": 25.0, }, ], ) print("✅ usage_data table created and mock data inserted.") Create a Postgres Tool for the Agent: Next, we configure an AI agent tool to retrieve data from Postgres. The Python script billing_agent_tools.py contains: The function billing_anomaly_summary() that: Pulls usage data from Neon. Computes % change in api_calls. Flags anomalies with a threshold of > 1.5x change. Exports user_functions list for the Azure AI Agent to use. You do not need to run it separately. # billing_agent_tools.py file import os import json import pandas as pd from sqlalchemy import create_engine from dotenv import load_dotenv # Load environment variables load_dotenv() # Set up the database engine NEON_DB_URL = os.getenv("NEON_DB_CONNECTION_STRING") db_engine = create_engine(NEON_DB_URL) # Define the billing anomaly detection function def billing_anomaly_summary( tenant_id: str, start_date: str = "2025-03-27", end_date: str = "2025-04-01", limit: int = 10, ) -> str: """ Fetches recent usage data for a SaaS tenant and detects potential billing anomalies. :param tenant_id: The tenant ID to analyze. :type tenant_id: str :param start_date: Start date for the usage window. :type start_date: str :param end_date: End date for the usage window. :type end_date: str :param limit: Maximum number of records to return. :type limit: int :return: A JSON string with usage records and anomaly flags. :rtype: str """ query = """ SELECT date, api_calls, storage_gb FROM usage_data WHERE tenant_id = %s AND date BETWEEN %s AND %s ORDER BY date DESC LIMIT %s; """ df = pd.read_sql(query, db_engine, params=(tenant_id, start_date, end_date, limit)) if df.empty: return json.dumps( {"message": "No usage data found for this tenant in the specified range."} ) df.sort_values("date", inplace=True) df["pct_change_api"] = df["api_calls"].pct_change() df["anomaly"] = df["pct_change_api"].abs() > 1.5 return df.to_json(orient="records") # Register this in a list to be used by FunctionTool user_functions = [billing_anomaly_summary] Create and Configure the AI Agent: Now we'll set up the AI agent and integrate it with our Neon Postgres tool using the Azure AI Agent Service SDK. The Python script does the following: Creates the agent Instantiates an AI agent using the selected model (gpt-4o, for example), adds tool access, and sets instructions that tell the agent how to behave (e.g., “You are a helpful SaaS assistant…”). Creates a conversation thread A thread is started to hold a conversation between the user and the agent. Posts a user message Sends a question like “Why did my billing spike for tenant_456 this week?” to the agent. Processes the request The agent reads the message, determines that it should use the custom tool to retrieve usage data, and processes the query. Displays the response Prints the response from the agent with a natural language explanation based on the tool’s output. # billing_anomaly_agent.py import os from datetime import datetime from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from azure.ai.projects.models import FunctionTool, ToolSet from dotenv import load_dotenv from pprint import pprint from billing_agent_tools import user_functions # Custom tool function module # Load environment variables from .env file load_dotenv() # Create an Azure AI Project Client project_client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str=os.environ["PROJECT_CONNECTION_STRING"], ) # Initialize toolset with our user-defined functions functions = FunctionTool(user_functions) toolset = ToolSet() toolset.add(functions) # Create the agent agent = project_client.agents.create_agent( model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], name=f"billing-anomaly-agent-{datetime.now().strftime('%Y%m%d%H%M')}", description="Billing Anomaly Detection Agent", instructions=f""" You are a helpful SaaS financial assistant that retrieves and explains billing anomalies using usage data. The current date is {datetime.now().strftime("%Y-%m-%d")}. """, toolset=toolset, ) print(f"Created agent, ID: {agent.id}") # Create a communication thread thread = project_client.agents.create_thread() print(f"Created thread, ID: {thread.id}") # Post a message to the agent thread message = project_client.agents.create_message( thread_id=thread.id, role="user", content="Why did my billing spike for tenant_456 this week?", ) print(f"Created message, ID: {message.id}") # Run the agent and process the query run = project_client.agents.create_and_process_run( thread_id=thread.id, agent_id=agent.id ) print(f"Run finished with status: {run.status}") if run.status == "failed": print(f"Run failed: {run.last_error}") # Fetch and display the messages messages = project_client.agents.list_messages(thread_id=thread.id) print("Messages:") pprint(messages["data"][0]["content"][0]["text"]["value"]) # Optional cleanup: # project_client.agents.delete_agent(agent.id) # print("Deleted agent") Run the agent: To run the agent, run the following command python billing_anomaly_agent.py Snippet of output from agent: Using the Azure AI Foundry Agent Playground: After running your agent using the Azure AI Agent SDK, it is saved within your Azure AI Foundry project. You can now experiment with it using the Agent Playground. To try it out: Go to the Agents section in your Azure AI Foundry workspace. Find your billing anomaly agent in the list and click to open it. Use the playground interface to test different financial or billing-related questions, such as: “Did tenant_456 exceed their API usage quota this month?” “Explain recent storage usage changes for tenant_456.” This is a great way to validate your agent's behavior without writing more code. Summary: You’ve now created a working AI agent that talks to your Postgres database, all using: A simple Python function Azure AI Agent Service A Neon Serverless Postgres backend This approach is beginner-friendly, lightweight, and practical for real-world use. Want to go further? You can: Add more tools to the agent Integrate with vector search (e.g., detect anomaly reasons from logs using embeddings) Resources: Introduction to Azure AI Agent Service Develop an AI agent with Azure AI Agent Service Getting Started with Azure AI Agent Service Neon on Azure Build AI Agents with Azure AI Agent Service and Neon Multi-Agent AI Solution with Neon, Langchain, AutoGen and Azure OpenAI Azure AI Foundry GitHub Discussions That's it, folks! But the best part? You can become part of a thriving community of learners and builders by joining the Microsoft Learn Student Ambassadors Community. Connect with like-minded individuals, explore hands-on projects, and stay updated with the latest in cloud and AI. 💬 Join the community on Discord here and explore more benefits on the Microsoft Learn Student Hub.286Views4likes1CommentUnleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration
Artificial Intelligence is evolving rapidly, and one of the most pressing challenges is enabling AI models to interact effectively with external tools, data sources, and APIs. The Model Context Protocol (MCP) solves this problem by acting as a bridge between AI models and external services, creating a standardized communication framework that enhances tool integration, accessibility, and AI reasoning capabilities. What is Model Context Protocol (MCP)? MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge. Key Features of MCP Standardized Communication – MCP provides a structured way for AI models to interact with various tools. Tool Access & Expansion – AI assistants can now utilize external tools for real-time insights. Secure & Scalable – Enables safe and scalable integration with enterprise applications. Multi-Modal Integration – Supports STDIO, SSE (Server-Sent Events), and WebSocket communication methods. MCP Architecture & How It Works MCP follows a client-server architecture that allows AI models to interact with external tools efficiently. Here’s how it works: Components of MCP MCP Host – The AI model (e.g., Azure OpenAI GPT) requesting data or actions. MCP Client – An intermediary service that forwards the AI model's requests to MCP servers. MCP Server – Lightweight applications that expose specific capabilities (APIs, databases, files, etc.). Data Sources – Various backend systems, including local storage, cloud databases, and external APIs. Data Flow in MCP The AI model sends a request (e.g., "fetch user profile data"). The MCP client forwards the request to the appropriate MCP server. The MCP server retrieves the required data from a database or API. The response is sent back to the AI model via the MCP client. Integrating MCP with Azure OpenAI Services Microsoft has integrated MCP with Azure OpenAI Services, allowing GPT models to interact with external services and fetch live data. This means AI models are no longer limited to static knowledge but can access real-time information. Benefits of Azure OpenAI Services + MCP Integration ✔ Real-time Data Fetching – AI assistants can retrieve fresh information from APIs, databases, and internal systems. ✔ Contextual AI Responses – Enhances AI responses by providing accurate, up-to-date information. ✔ Enterprise-Ready – Secure and scalable for business applications, including finance, healthcare, and retail. Hands-On Tools for MCP Implementation To implement MCP effectively, Microsoft provides two powerful tools: Semantic Workbench and AI Gateway. Microsoft Semantic Workbench A development environment for prototyping AI-powered assistants and integrating MCP-based functionalities. Features: Build and test multi-agent AI assistants. Configure settings and interactions between AI models and external tools. Supports GitHub Codespaces for cloud-based development. Explore Semantic Workbench Workbench interface examples Microsoft AI Gateway A plug-and-play interface that allows developers to experiment with MCP using Azure API Management. Features: Credential Manager – Securely handle API credentials. Live Experimentation – Test AI model interactions with external tools. Pre-built Labs – Hands-on learning for developers. Explore AI Gateway Setting Up MCP with Azure OpenAI Services Step 1: Create a Virtual Environment First, create a virtual environment using Python: python -m venv .venv Activate the environment: # Windows venv\Scripts\activate # MacOS/Linux source .venv/bin/activate Step 2: Install Required Libraries Create a requirements.txt file and add the following dependencies: langchain-mcp-adapters langgraph langchain-openai Then, install the required libraries: pip install -r requirements.txt Step 3: Set Up OpenAI API Key Ensure you have your OpenAI API key set up: # Windows setx OPENAI_API_KEY "<your_api_key> # MacOS/Linux export OPENAI_API_KEY=<your_api_key> Building an MCP Server This server performs basic mathematical operations like addition and multiplication. Create the Server File First, create a new Python file: touch math_server.py Then, implement the server: from mcp.server.fastmcp import FastMCP # Initialize the server mcp = FastMCP("Math") MCP.tool() def add(a: int, b: int) -> int: return a + b MCP.tool() def multiply(a: int, b: int) -> int: return a * b if __name__ == "__main__": mcp.run(transport="stdio") Your MCP server is now ready to run. Building an MCP Client This client connects to the MCP server and interacts with it. Create the Client File First, create a new file: touch client.py Then, implement the client: import asyncio from mcp import ClientSession, StdioServerParameters from langchain_openai import ChatOpenAI from mcp.client.stdio import stdio_client # Define server parameters server_params = StdioServerParameters( command="python", args=["math_server.py"], ) # Define the model model = ChatOpenAI(model="gpt-4o") async def run_agent(): async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await load_mcp_tools(session) agent = create_react_agent(model, tools) agent_response = await agent.ainvoke({"messages": "what's (4 + 6) x 14?"}) return agent_response["messages"][3].content if __name__ == "__main__": result = asyncio.run(run_agent()) print(result) Your client is now set up and ready to interact with the MCP server. Running the MCP Server and Client Step 1: Start the MCP Server Open a terminal and run: python math_server.py This starts the MCP server, making it available for client connections. Step 2: Run the MCP Client In another terminal, run: python client.py Expected Output 140 This means the AI agent correctly computed (4 + 6) x 14 using both the MCP server and GPT-4o. Conclusion Integrating MCP with Azure OpenAI Services enables AI applications to securely interact with external tools, enhancing functionality beyond text-based responses. With standardized communication and improved AI capabilities, developers can build smarter and more interactive AI-powered solutions. By following this guide, you can set up an MCP server and client, unlocking the full potential of AI with structured external interactions. Next Steps: Explore more MCP tools and integrations. Extend your MCP setup to work with additional APIs. Deploy your solution in a cloud environment for broader accessibility. For further details, visit the GitHub repository for MCP integration examples and best practices. MCP GitHub Repository MCP Documentation Semantic Workbench AI Gateway MCP Video Walkthrough MCP Blog MCP Github End to End Demo29KViews4likes4CommentsCombating Digitally Altered Images: Deepfake Detection
In today's digital age, the rise of deepfake technology poses significant threats to credibility, privacy, and security. This article delves into our Deepfake Detection Project, a robust solution designed to combat the misuse of AI-generated content.378Views2likes0CommentsHow to use DefaultAzureCredential across multiple tenants
If you are using the DefaultAzureCredential class from the Azure Identity SDK while your user account is associated with multiple tenants, you may find yourself frequently running into API authentication errors (such as HTTP 401/Unauthorized). This post is for you! These are your two options for successful authentication from a non-default tenant: Setup your environment precisely to force DefaultAzureCredential to use the desired tenant Use a specific credential class and explicitly pass in the desired tenant ID Option 1: Get DefaultAzureCredential working The DefaultAzureCredential class is a credential chain, which means that it tries a sequence of credential classes until it finds one that can authenticate successfully. The current sequence is: EnvironmentCredential WorkloadIdentityCredential ManagedIdentityCredential SharedTokenCacheCredential AzureCliCredential AzurePowerShellCredential AzureDeveloperCliCredential InteractiveBrowserCredential For example, on my personal machine, only two of those credentials can retrieve tokens: AzureCliCredential : from logging in with Azure CLI ( az login ) AzureDeveloperCliCredential : from logging in with Azure Developer CLI ( azd auth login ) Many developers are logged in with those two credentials, so it's crucial to understand how this chained credential works. The AzureCliCredential is earlier in the chain, so if you are logged in with that, you must have the desired tenant set as the "active tenant". According to Azure CLI documentation, there are two ways to set the active tenant: az account set --subscription SUBSCRIPTION-ID where the subscription is from the desired tenant az login --tenant TENANT-ID , with no subsequent az login commands after Whatever option you choose, you can confirm that your desired tenant is currently the default by running az account show and verifying the tenantId in the account details shown. If you are only logged in with the azd CLI and not the Azure CLI, you have a problem: the azd cli does not currently have a way to set the active tenant. If that credential is called with no additional information, azd assumes your home tenant, which may not be desired. The azd credential does check for a system variable called AZURE_TENANT_ID , however, so you can try setting that in your environment before running code that uses DefaultAzureCredential . That should work as long as the DefaultAzureCredential code is truly running in the same environment where AZURE_TENANT_ID has been set. Option 2: Use specific credentials For this second option, we will abandon DefaultAzureCredential entirely, and replace it with specific credential classes. This approach requires a code change, but once you've gone through the effort to set it up, it's generally a more predictable experience. Use CLI credential for local dev Several of the credential classes allow you to explicitly pass in a tenant ID, including both the AzureCliCredential and AzureDeveloperCliCredential . If you know that you’re always going to be logging in with a specific CLI, you can change your code to that credential: For example, in the Python SDK: AzureDeveloperCliCredential(tenant_id=os.environ["AZURE_TENANT_ID"]) For more flexibility, you can use conditionals to only pass in a tenant ID if one is set in the environment: if AZURE_TENANT_ID := os.environ("AZURE_TENANT_ID"): cred = AzureDeveloperCliCredential(tenant_id=AZURE_TENANT_ID) else: cred = AzureDeveloperCliCredential() 💁🏼♀️ Tip: As a best practice, I always like to add logging statements that note exactly what credential I'm calling and whether I'm passing in a tenant ID, to help me spot misconfigurations from the logs. Use managed identity credential in production You must be careful when replacing DefaultAzureCredential if our code will also be deployed to a production host. In that case, your code was previously relying on DefaultAzureCredential using the ManagedIdentityCredential in the chain, and you now need to call that credential class directly. You will also need to pass in the managed identity client ID, if your host is using user-assigned identity instead of system-assigned identity. For example, using managed identity in the Python SDK with user-assigned identity: ManagedIdentityCredential(client_id=os.environ["AZURE_CLIENT_ID"]) Here’s a full credential setup for an app that works locally with azd and works in production with managed identity (either system or user-assigned): if RUNNING_ON_AZURE: if AZURE_CLIENT_ID := os.getenv("AZURE_CLIENT_ID"): cred = ManagedIdentityCredential(client_id=AZURE_CLIENT_ID) else: cred = ManagedIdentityCredential() elif AZURE_TENANT_ID := os.getenv("AZURE_TENANT_ID"): cred = AzureDeveloperCliCredential(tenant_id=AZURE_TENANT_ID) else: cred = AzureDeveloperCliCredential() For a full walkthrough of an end-to-end template that uses keyless auth in multiple languages, check out my colleague's tutorials on using keyless auth in AI apps.288Views0likes0CommentsUnderstanding Azure OpenAI Service Quotas and Limits: A Beginner-Friendly Guide
Azure OpenAI Service allows developers, researchers, and students to integrate powerful AI models like GPT-4, GPT-3.5, and DALL·E into their applications. But with great power comes great responsibility and limits. Before you dive into building your next AI-powered solution, it's crucial to understand how quotas and limits work in the Azure OpenAI ecosystem. This guide is designed to help students and beginners easily understand the concept of quotas, limits, and how to manage them effectively. What Are Quotas and Limits? Think of Azure's quotas as your "AI data pack." It defines how much you can use the service. Meanwhile, limits are hard boundaries set by Azure to ensure fair use and system stability. Quota The maximum number of resources (e.g., tokens, requests) allocated to your Azure subscription. Limit The technical cap imposed by Azure on specific resources (e.g., number of files, deployments). Key Metrics: TPM & RPM Tokens Per Minute (TPM) TPM refers to how many tokens you can use per minute across all your requests in each region. A token is a chunk of text. For example, the word "Hello" is 1 token, but "Understanding" might be 2 tokens. Each model has its own default TPM. Example: GPT-4 might allow 240,000 tokens per minute. You can split this quota across multiple deployments. Requests Per Minute (RPM) RPM defines how many API requests you can make every minute. For instance, GPT-3.5-turbo might allow 350 RPM. DALL·E image generation models might allow 6 RPM. Deployment, File, and Training Limits Here are some standard limits imposed on your OpenAI resource: Resource Type Limit Standard model deployments 32 Fine-tuned model deployments 5 Training jobs 100 total per resource (1 active at a time) Fine-tuning files 50 files (total size: 1 GB) Max prompt tokens per request Varies by model (e.g., 4096 tokens for GPT-3.5) How to View and Manage Your Quota Step-by-Step: Go to the Azure Portal. Navigate to your Azure OpenAI resource. Click on "Usage + quotas" in the left-hand menu. You will see TPM, RPM, and current usage status. To Request More Quota: In the same "Usage + quotas" panel, click on "Request quota increase". Fill in the form: Select the region. Choose the model family (e.g., GPT-4, GPT-3.5). Enter the desired TPM and RPM values. Submit and wait for Azure to review and approve. What is Dynamic Quota? Sometimes, Azure gives you extra quota based on demand and availability. “Dynamic quota” is not guaranteed and may increase or decrease. Useful for short-term spikes but should not be relied on for production apps. Example: During weekends, your GPT-3.5 TPM may temporarily increase if there's less traffic in your region. Best Practices for Students Monitor Regularly: Use the Azure Portal to keep an eye on your usage. Batch Requests: Combine multiple tasks in one API call to save tokens. Start Small: Begin with GPT-3.5 before requesting GPT-4 access. Plan Ahead: If you're preparing a demo or a project, request quota in advance. Handle Limits Gracefully: Code should manage 429 Too Many Requests errors. Quick Resources Azure OpenAI Quotas and Limits How to Request Quota in Azure Join the Conversation on Azure AI Foundry Discussions! Have ideas, questions, or insights about AI? Don't keep them to yourself! Share your thoughts, engage with experts, and connect with a community that’s shaping the future of artificial intelligence. 🧠✨ 👉 Click here to join the discussion!555Views0likes0CommentsUnlocking the Power of Azure: Mastering Resource Management in Kubernetes
Hi, I’m Pranjal Mishra, a Student Ambassador from Galgotias University specializing in AI & ML. Passionate about cloud computing and DevOps, I often explore how platforms like Azure streamline infrastructure challenges, especially with Kubernetes. In this article, we explore resource management in Kubernetes using Azure Kubernetes Service (AKS)—focusing on setting resource limits and quotas to optimize cost, performance, and stability. We walk through creating namespaces, setting default CPU/memory requests and limits, and applying resource quotas. By using tools like Azure Monitor, Azure Policy, and Virtual Nodes, teams can ensure their containerized applications are efficient, resilient, and cost-effective. Whether you're new to AKS or looking to refine your DevOps practices, this guide offers practical steps and real-world context to get you started.123Views0likes0Comments