openai
54 TopicsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.100Views0likes0CommentsStep-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.4.3KViews1like2CommentsHow to build Tool-calling Agents with Azure OpenAI and Lang Graph
Introducing MyTreat Our demo is a fictional website that shows customers their total bill in dollars, but they have the option of getting the total bill in their local currencies. The button sends a request to the Node.js service and a response is simply returned from our Agent given the tool it chooses. Let’s dive in and understand how this works from a broader perspective. Prerequisites An active Azure subscription. You can sign up for a free trial here or get $100 worth of credits on Azure every year if you are a student. A GitHub account (not necessarily) Node.js LTS 18 + VS Code installed (or your favorite IDE) Basic knowledge of HTML, CSS, JS Creating an Azure OpenAI Resource Go over to your browser and key in portal.azure.com to access the Microsoft Azure Portal. Over there navigate to the search bar and type Azure OpenAI. Go ahead and click on + Create. Fill in the input boxes with appropriate, for example, as shown below then press on next until you reach review and submit then finally click on Create. After the deployment is done, go to the deployment and access Azure AI Foundry portal using the button as show below. You can also use the link as demonstrated below. In the Azure AI Foundry portal, we have to create our model instance so we have to go over to Model Catalog on the left panel beneath Get Started. Select a desired model, in this case I used gpt-35-turbo for chat completion (in your case use gpt-4o). Below is a way of doing this. Choose a model (gpt-4o) Click on deploy Give the deployment a new name e.g. myTreatmodel, then click deploy and wait for it to finish On the left panel go over to deployments and you will see the model you have created. Access your Azure OpenAI Resource Key Go back to Azure portal and specifically to the deployment instance that we have and select on the left panel, Resource Management. Click on Keys and Endpoints. Copy any of the keys as shown below and keep it very safe as we will use it in our .env file. Configuring your project Create a new project folder on your local machine and add these variables to the .env file in the root folder. AZURE_OPENAI_API_INSTANCE_NAME= AZURE_OPENAI_API_DEPLOYMENT_NAME= AZURE_OPENAI_API_KEY= AZURE_OPENAI_API_VERSION="2024-08-01-preview" LANGCHAIN_TRACING_V2="false" LANGCHAIN_CALLBACKS_BACKGROUND = "false" PORT=4556 Starting a new project Go over to https://github.com/tiprock-network/mytreat.git and follow the instructions to setup the new project, if you do not have git installed, go over to the Code button and press Download ZIP. This will enable you get the project folder and follow the same procedure for setting up. Creating a custom tool In the utils folder the math tool was created, this code show below uses tool from Langchain to build a tool and the schema of the tool is created using zod.js, a library that helps in validating an object’s property value. The price function takes in an array of prices and the exchange rate, adds the prices up and converts them using the exchange rate as shown below. import { tool } from '@langchain/core/tools' import { z } from 'zod' const priceConv = tool((input) =>{ //get the prices and add them up after turning each into let sum = 0 input.prices.forEach((price) => { let price_check = parseFloat(price) sum += price_check }) //now change the price using exchange rate let final_price = parseFloat(input.exchange_rate) * sum //return return final_price },{ name: 'add_prices_and_convert', description: 'Add prices and convert based on exchange rate.', schema: z.object({ prices: z.number({ required_error: 'Price should not be empty.', invalid_type_error: 'Price must be a number.' }).array().nonempty().describe('Prices of items listed.'), exchange_rate: z.string().describe('Current currency exchange rate.') }) }) export { priceConv } Utilizing the tool In the controller’s folder we then bring the tool in by importing it. After that we pass it in to our array of tools. Notice that we have the Tavily Search Tool, you can learn how to implement in the Additional Reads Section or just remove it. Agent Model and the Call Process This code defines an AI agent using LangGraph and LangChain.js, powered by GPT-4o from Azure OpenAI. It initializes a ToolNode to manage tools like priceConv and binds them to the agent model. The StateGraph handles decision-making, determining whether the agent should call a tool or return a direct response. If a tool is needed, the workflow routes the request accordingly; otherwise, the agent responds to the user. The callModel function invokes the agent, processing messages and ensuring seamless tool integration. The searchAgentController is a GET endpoint that accepts user queries (text_message). It processes input through the compiled LangGraph workflow, invoking the agent to generate a response. If a tool is required, the agent calls it before finalizing the output. The response is then sent back to the user, ensuring dynamic and efficient tool-assisted reasoning. //create tools the agent will use //const agentTools = [new TavilySearchResults({maxResults:5}), priceConv] const agentTools = [ priceConv] const toolNode = new ToolNode(agentTools) const agentModel = new AzureChatOpenAI({ model:'gpt-4o', temperature:0, azureOpenAIApiKey: AZURE_OPENAI_API_KEY, azureOpenAIApiInstanceName:AZURE_OPENAI_API_INSTANCE_NAME, azureOpenAIApiDeploymentName:AZURE_OPENAI_API_DEPLOYMENT_NAME, azureOpenAIApiVersion:AZURE_OPENAI_API_VERSION }).bindTools(agentTools) //make a decision to continue or not const shouldContinue = ( state ) => { const { messages } = state const lastMessage = messages[messages.length -1] //upon tool call we go to tools if("tool_calls" in lastMessage && Array.isArray(lastMessage.tool_calls) && lastMessage.tool_calls?.length) return "tools"; //if no tool call is made we stop and return back to the user return END } const callModel = async (state) => { const response = await agentModel.invoke(state.messages) return { messages: [response] } } //define a new graph const workflow = new StateGraph(MessagesAnnotation) .addNode("agent", callModel) .addNode("tools", toolNode) .addEdge(START, "agent") .addConditionalEdges("agent", shouldContinue, ["tools", END]) .addEdge("tools", "agent") const appAgent = workflow.compile() The above is implemented with the following code: Frontend The frontend is a simple HTML+CSS+JS stack that demonstrated how you can use an API to integrate this AI Agent to your website. It sends a GET request and uses the response to get back the right answer. Below is an illustration of how fetch API has been used. const searchAgentController = async ( req, res ) => { //get human text const { text_message } = req.query if(!text_message) return res.status(400).json({ message:'No text sent.' }) //invoke the agent const agentFinalState = await appAgent.invoke( { messages: [new HumanMessage(text_message)] }, {streamMode: 'values'} ) //const agentFinalState_b = await agentModel.invoke(text_message) /*return res.status(200).json({ answer:agentFinalState.messages[agentFinalState.messages.length - 1].content })*/ //console.log(agentFinalState_b.tool_calls) res.status(200).json({ text: agentFinalState.messages[agentFinalState.messages.length - 1].content }) } There you go! We have created a basic tool-calling agent using Azure and Langchain successfully, go ahead and expand the code base to your liking. If you have questions you can comment below or reach out on my socials. Additional Reads Azure Open AI Service Models Generative AI for Beginners AI Agents for Beginners Course Lang Graph Tutorial Develop Generative AI Apps in Azure AI Foundry Portal1.7KViews0likes1CommentUnleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration
Artificial Intelligence is evolving rapidly, and one of the most pressing challenges is enabling AI models to interact effectively with external tools, data sources, and APIs. The Model Context Protocol (MCP) solves this problem by acting as a bridge between AI models and external services, creating a standardized communication framework that enhances tool integration, accessibility, and AI reasoning capabilities. What is Model Context Protocol (MCP)? MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge. Key Features of MCP Standardized Communication – MCP provides a structured way for AI models to interact with various tools. Tool Access & Expansion – AI assistants can now utilize external tools for real-time insights. Secure & Scalable – Enables safe and scalable integration with enterprise applications. Multi-Modal Integration – Supports STDIO, SSE (Server-Sent Events), and WebSocket communication methods. MCP Architecture & How It Works MCP follows a client-server architecture that allows AI models to interact with external tools efficiently. Here’s how it works: Components of MCP MCP Host – The AI model (e.g., Azure OpenAI GPT) requesting data or actions. MCP Client – An intermediary service that forwards the AI model's requests to MCP servers. MCP Server – Lightweight applications that expose specific capabilities (APIs, databases, files, etc.). Data Sources – Various backend systems, including local storage, cloud databases, and external APIs. Data Flow in MCP The AI model sends a request (e.g., "fetch user profile data"). The MCP client forwards the request to the appropriate MCP server. The MCP server retrieves the required data from a database or API. The response is sent back to the AI model via the MCP client. Integrating MCP with Azure OpenAI Services Microsoft has integrated MCP with Azure OpenAI Services, allowing GPT models to interact with external services and fetch live data. This means AI models are no longer limited to static knowledge but can access real-time information. Benefits of Azure OpenAI Services + MCP Integration ✔ Real-time Data Fetching – AI assistants can retrieve fresh information from APIs, databases, and internal systems. ✔ Contextual AI Responses – Enhances AI responses by providing accurate, up-to-date information. ✔ Enterprise-Ready – Secure and scalable for business applications, including finance, healthcare, and retail. Hands-On Tools for MCP Implementation To implement MCP effectively, Microsoft provides two powerful tools: Semantic Workbench and AI Gateway. Microsoft Semantic Workbench A development environment for prototyping AI-powered assistants and integrating MCP-based functionalities. Features: Build and test multi-agent AI assistants. Configure settings and interactions between AI models and external tools. Supports GitHub Codespaces for cloud-based development. Explore Semantic Workbench Workbench interface examples Microsoft AI Gateway A plug-and-play interface that allows developers to experiment with MCP using Azure API Management. Features: Credential Manager – Securely handle API credentials. Live Experimentation – Test AI model interactions with external tools. Pre-built Labs – Hands-on learning for developers. Explore AI Gateway Setting Up MCP with Azure OpenAI Services Step 1: Create a Virtual Environment First, create a virtual environment using Python: python -m venv .venv Activate the environment: # Windows venv\Scripts\activate # MacOS/Linux source .venv/bin/activate Step 2: Install Required Libraries Create a requirements.txt file and add the following dependencies: langchain-mcp-adapters langgraph langchain-openai Then, install the required libraries: pip install -r requirements.txt Step 3: Set Up OpenAI API Key Ensure you have your OpenAI API key set up: # Windows setx OPENAI_API_KEY "<your_api_key> # MacOS/Linux export OPENAI_API_KEY=<your_api_key> Building an MCP Server This server performs basic mathematical operations like addition and multiplication. Create the Server File First, create a new Python file: touch math_server.py Then, implement the server: from mcp.server.fastmcp import FastMCP # Initialize the server mcp = FastMCP("Math") MCP.tool() def add(a: int, b: int) -> int: return a + b MCP.tool() def multiply(a: int, b: int) -> int: return a * b if __name__ == "__main__": mcp.run(transport="stdio") Your MCP server is now ready to run. Building an MCP Client This client connects to the MCP server and interacts with it. Create the Client File First, create a new file: touch client.py Then, implement the client: import asyncio from mcp import ClientSession, StdioServerParameters from langchain_openai import ChatOpenAI from mcp.client.stdio import stdio_client # Define server parameters server_params = StdioServerParameters( command="python", args=["math_server.py"], ) # Define the model model = ChatOpenAI(model="gpt-4o") async def run_agent(): async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await load_mcp_tools(session) agent = create_react_agent(model, tools) agent_response = await agent.ainvoke({"messages": "what's (4 + 6) x 14?"}) return agent_response["messages"][3].content if __name__ == "__main__": result = asyncio.run(run_agent()) print(result) Your client is now set up and ready to interact with the MCP server. Running the MCP Server and Client Step 1: Start the MCP Server Open a terminal and run: python math_server.py This starts the MCP server, making it available for client connections. Step 2: Run the MCP Client In another terminal, run: python client.py Expected Output 140 This means the AI agent correctly computed (4 + 6) x 14 using both the MCP server and GPT-4o. Conclusion Integrating MCP with Azure OpenAI Services enables AI applications to securely interact with external tools, enhancing functionality beyond text-based responses. With standardized communication and improved AI capabilities, developers can build smarter and more interactive AI-powered solutions. By following this guide, you can set up an MCP server and client, unlocking the full potential of AI with structured external interactions. Next Steps: Explore more MCP tools and integrations. Extend your MCP setup to work with additional APIs. Deploy your solution in a cloud environment for broader accessibility. For further details, visit the GitHub repository for MCP integration examples and best practices. MCP GitHub Repository MCP Documentation Semantic Workbench AI Gateway MCP Video Walkthrough MCP Blog MCP Github End to End Demo29KViews4likes4CommentsUnderstanding Azure OpenAI Service Quotas and Limits: A Beginner-Friendly Guide
Azure OpenAI Service allows developers, researchers, and students to integrate powerful AI models like GPT-4, GPT-3.5, and DALL·E into their applications. But with great power comes great responsibility and limits. Before you dive into building your next AI-powered solution, it's crucial to understand how quotas and limits work in the Azure OpenAI ecosystem. This guide is designed to help students and beginners easily understand the concept of quotas, limits, and how to manage them effectively. What Are Quotas and Limits? Think of Azure's quotas as your "AI data pack." It defines how much you can use the service. Meanwhile, limits are hard boundaries set by Azure to ensure fair use and system stability. Quota The maximum number of resources (e.g., tokens, requests) allocated to your Azure subscription. Limit The technical cap imposed by Azure on specific resources (e.g., number of files, deployments). Key Metrics: TPM & RPM Tokens Per Minute (TPM) TPM refers to how many tokens you can use per minute across all your requests in each region. A token is a chunk of text. For example, the word "Hello" is 1 token, but "Understanding" might be 2 tokens. Each model has its own default TPM. Example: GPT-4 might allow 240,000 tokens per minute. You can split this quota across multiple deployments. Requests Per Minute (RPM) RPM defines how many API requests you can make every minute. For instance, GPT-3.5-turbo might allow 350 RPM. DALL·E image generation models might allow 6 RPM. Deployment, File, and Training Limits Here are some standard limits imposed on your OpenAI resource: Resource Type Limit Standard model deployments 32 Fine-tuned model deployments 5 Training jobs 100 total per resource (1 active at a time) Fine-tuning files 50 files (total size: 1 GB) Max prompt tokens per request Varies by model (e.g., 4096 tokens for GPT-3.5) How to View and Manage Your Quota Step-by-Step: Go to the Azure Portal. Navigate to your Azure OpenAI resource. Click on "Usage + quotas" in the left-hand menu. You will see TPM, RPM, and current usage status. To Request More Quota: In the same "Usage + quotas" panel, click on "Request quota increase". Fill in the form: Select the region. Choose the model family (e.g., GPT-4, GPT-3.5). Enter the desired TPM and RPM values. Submit and wait for Azure to review and approve. What is Dynamic Quota? Sometimes, Azure gives you extra quota based on demand and availability. “Dynamic quota” is not guaranteed and may increase or decrease. Useful for short-term spikes but should not be relied on for production apps. Example: During weekends, your GPT-3.5 TPM may temporarily increase if there's less traffic in your region. Best Practices for Students Monitor Regularly: Use the Azure Portal to keep an eye on your usage. Batch Requests: Combine multiple tasks in one API call to save tokens. Start Small: Begin with GPT-3.5 before requesting GPT-4 access. Plan Ahead: If you're preparing a demo or a project, request quota in advance. Handle Limits Gracefully: Code should manage 429 Too Many Requests errors. Quick Resources Azure OpenAI Quotas and Limits How to Request Quota in Azure Join the Conversation on Azure AI Foundry Discussions! Have ideas, questions, or insights about AI? Don't keep them to yourself! Share your thoughts, engage with experts, and connect with a community that’s shaping the future of artificial intelligence. 🧠✨ 👉 Click here to join the discussion!555Views0likes0CommentsUnlock the Future of Secure Authentication: Moving to Keyless Authentication with Managed Identity
Why Managed Identity? Traditional authentication methods often rely on keys, secrets, and passwords that can be easily compromised. Managed identity, on the other hand, provides a secure and seamless way to authenticate without the need for managing credentials. By leveraging managed identity, you can: Reduce the Risk of Compromise: As most security breaches start from identity-related issues, moving to a keyless authentication system significantly reduces the chances of such compromises. Simplify Credential Management: Managed identity eliminates the need for managing keys and secrets, making the authentication process more straightforward and less error-prone. Enhance Security: With managed identity, your applications are granted access to resources securely, without the risk of exposing sensitive credentials. Getting Started with Managed Identity To help you get started with managed identity, Microsoft offers comprehensive training modules for different programming languages. These modules cover the basics of using managed identity to authenticate to Azure OpenAI, providing you with the knowledge and skills needed to implement secure authentication in your applications. Available Microsoft Learn Training Modules: Introduction to using Managed Identity to authenticate to Azure OpenAI with .NET - Training | Microsoft Learn Introduction to Azure OpenAI Managed Identity Authentication with Java - Training | Microsoft Learn Introduction to Azure OpenAI Managed Identity Authentication with Python - Training | Microsoft Learn Introduction to Azure OpenAI Managed Identity Authentication with JavaScript - Training | Microsoft Learn Why Should Students Learn Managed Identity? As a student, learning about managed identity and keyless authentication is not just about enhancing your technical skills; it's about preparing for the future. Here are a few reasons why you should dive into managed identity: Stay Ahead in the Job Market: With cybersecurity being a top priority for organizations, having expertise in secure authentication methods like managed identity will make you a valuable asset to potential employers. Build Secure Applications: By implementing managed identity, you can build applications that are more secure, reliable, and less susceptible to breaches. Understand Modern Security Practices: Gaining knowledge about managed identity and keyless authentication will give you a deeper understanding of modern security practices and how to protect applications in today's digital landscape. Conclusion In conclusion, moving to keyless authentication through managed identity is a game-changer for securing applications. As students and future developers, embracing this technology will not only enhance your skills but also contribute to building a safer and more secure digital world. So, take the first step today by exploring the training modules and mastering the art of managed identity!261Views3likes1CommentBuilding a Basic Chatbot with Azure OpenAI
Overview In this turorial, we'll build a simple chatbot that uses Azure OpenAI to generate responses to user queries. To create a basic chatbot, we need to set up a language model resource that enables conversation capabilities. In this tutorial, we will: Set up the Azure OpenAI resource using the Azure AI Foundry portal. Retrieve the API key needed to connect the resource to your chatbot application. Once the API key is configured in your code, you will be able to integrate the language model into your chatbot and enable it to generate responses. By the end of this tutorial, you'll have a working chatbot that can generate responses using the Azure OpenAI model. Signing In and Setting Up Your Azure AI Foundry Workspace Signing In to Azure AI Foundry Open the Azure AI Foundry page in your web browser. Login to your Azure account. If you don't have an account, you can sign up. Setting Up Your Azure AI Foundry Workspace Select + Create project to create a new project. Perform the following tasks: Enter Project name. It must be a unique value. Select Hub you'd like to use (create a new one if needed). Select Create. Setting Up the Azure OpenAI Resource in Azure AI Foundry In this step, you'll learn how to set up the Azure OpenAI resource in Azure AI Foundry. Azure OpenAI is a pre-trained language model that can generate responses to user queries. We'll be using it in our chatbot. Select Models + endpoints from the left side menu. On this page, you can deploy language models and set up Azure AI resources. In this step, we will deploy the Azure OpenAI GPT-4 language model. Select + Deploy model. Select Deploy base model. In this tutorial, we will deploy the GPT-4o model. Select GPT-4o. Select Confirm. Select Deploy. The model will be deployed. Once the deployment is complete, you will see the model listed on the Models + endpoints page. Now that the model is deployed, you can retrieve the API key needed to connect the model to your chatbot application. Select the model you deployed on the Models + endpoints page. ` On the model details page, you can view information about the model, including the API key. We will come back this page later to add the required information into the environment variables. Setting Up the Project and Install the Libraries Now, you will create a folder to work in and set up a virtual environment to develop a program. Creating a Folder to Work Inside It Open a terminal window and type the following command to create a folder named basic-chatbot in the default path. mkdir basic-chatbot Type the following command inside your terminal to navigate to the basic-chatbot folder you created. cd basic-chatbot Creating a Virtual Environment Type the following command inside your terminal to create a virtual environment named .venv. python -m venv .venv Type the following command inside your terminal to activate the virtual environment. .venv\Scripts\activate.bat NOTE If it worked, you should see (.venv) before the command prompt. Installing the Required Packages Type the following commands inside your terminal to install the required packages. openai: A Python library that provides integration with the Azure OpenAI API. python-dotenv: A Python library for managing environment variables stored in an .env file. pip install openai python-dotenv Setting up the Project in Visual Studio Code To create a basic chatbot program, you will need two files: example.py: This file will contain the code to interact with Azure resources. .env: This file will store the Azure credentials and configuration details. NOTE Purpose of the .env File The .env file is essential for storing the Azure information required to connect and use the resources you created. By keeping the Azure credentials in the .env file, you can ensure a secure and organized way to manage sensitive information. Setting Up example.py File Open Visual Studio Code. Select File from the menu bar. Select Open Folder. Select the basic-chatbot folder that you created, which is located at C:\Users\yourUserName\basic-chatbot. In the left pane of Visual Studio Code, right-click and select New File to create a new file named example.py. Add the following code to the example.py file to import the required libraries. from openai import AzureOpenAI from dotenv import load_dotenv import os # Load environment variables from the .env file load_dotenv() # Retrieve environment variables AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT") AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY") AZURE_OPENAI_MODEL_NAME = os.getenv("AZURE_OPENAI_MODEL_NAME") AZURE_OPENAI_CHAT_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME") AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION") # Initialize Azure OpenAI client client = AzureOpenAI( api_key=AZURE_OPENAI_API_KEY, api_version=AZURE_OPENAI_API_VERSION, base_url=f"{AZURE_OPENAI_ENDPOINT}/openai/deployments/{AZURE_OPENAI_CHAT_DEPLOYMENT_NAME}" ) print("Chatbot: Hello! How can I assist you today? Type 'exit' to end the conversation.") while True: user_input = input("You: ") if user_input.lower() == "exit": print("Chatbot: Ending the conversation. Have a great day!") break response = client.chat.completions.create( model=AZURE_OPENAI_MODEL_NAME, messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": user_input} ], max_tokens=200 ) print("Chatbot:", response.choices[0].message.content.strip()) Setting Up .env File To set up your development environment, we will create a .env file and store the necessary credentials directly. NOTE Complete folder structure: └── YourUserName . └── basic-chatbot . ├── example.py . └── .env In the left pane of Visual Studio Code, right-click and select New File to create a new file named .env. Add the following code to the .env file to include your Azure information. AZURE_OPENAI_API_KEY=your_azure_openai_api_key AZURE_OPENAI_ENDPOINT=https://your_azure_openai_endpoint AZURE_OPENAI_MODEL_NAME=your_model_name AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=your_deployment_name AZURE_OPENAI_API_VERSION=your_api_version Retrieving Environment Variables from Azure AI Foundry Now, you will retrieve the required information from Azure AI Foundry and update the .env file. Go to the Models + endpoints page and select your deployed model. On the Model Details page, copy the following information in to the .env file.: AZURE_OPENAI_API_KEY AZURE_OPENAI_ENDPOINT AZURE_OPENAI_MODEL_NAME AZURE_OPENAI_CHAT_DEPLOYMENT_NAME Paste this information into the .env file in the respective placeholders. Running the Chatbot Program Type the following command inside your terminal to run the program and see if it can answer questions. python example.py Interact with the chatbot by typing your questions or messages. The chatbot will generate responses based on the Azure OpenAI model you deployed. NOTE You can find the full example of this chatbot, including the code and .env template, in my GitHub repository: GitHub Repository1.3KViews2likes0CommentsUsing Advanced Reasoning Model on EdgeAI Part 1 - Quantization, Conversion, Performance
DeepSeek-R1 is very popular, and it can achieve the same capabilities as OpenAI o1 in advanced reasoning. Microsoft has also added DeepSeek-R1 models to Azure AI Foundry and GitHub Models. We can compare DeepSeek-R1 ith other available models through GitHub Models Playground Note This series revolves around deployment of SLMs to Edge Devices 'Edge AI' we will focus on the deployment advanced reasoning models, with different application scenarios. You can learn more in the following session AI Tour BRK453. In this experiement we want to deploy advanced reasoning models to the edge, so that they can run on edge devices with limited computing power and offline environments. At this time, the recommendation is to use the traditional ONNX model . We can use Microsoft Olive to convert the DeepSeek-R1 Distrill model. Getting started with Microsoft Olive is very straightforward. Install the Microsoft Olive library through the command line and Python 3.10+ (recommended) pip install olive-ai The DeepSeek-R1 Distrill model series has different parameters such as 1.5B, 7B, 8B, 14B, 32B, 70B, etc. This article is mainly based on the 1.5B, 7B, and 14B models (so a Small Language Model). CPU Inference Let's discuss 1.5B and 7B, which are models with lower parameter. We can directly use the CPU as computing for inference to test the effect (hardware environment Azure DevBox, AMD EPYC 7763 64-Core + 64GB Memory + 2T SSD) Quantization conversion olive auto-opt --model_name_or_path <Your DeepSeek-R1-Distill-Qwen-1.5B/7B local location> --output_path <Your Convert ONNX INT4 Model local location> --device cpu --provider CPUExecutionProvider --precision int4 --use_model_builder --log_level 1 You can download it directly from my Hugging face Repo (Note: This model is for testing and has not been fully tested by AI Content Safety or provided as an Offical Model) DeepSeek-R1-Distill-Qwen-1.5B-ONNX-INT4-CPU DeepSeek-R1-Distill-Qwen-7B-ONNX-INT4-CPU Running with ONNX Runtime GenAI Install ONNX Runtime GenAI and ONNX Runtime CPU support libraries pip install onnxruntime-genai pip install onnxruntime Sample Code https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/demo-1.5b.ipynb https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/demo-7b.ipynb Performance comparison 1.5B vs 7B We compare two different inference scenarios explain 1+1=2 1.5B quantized ONNX model memory occupied, time consumption and number of tokens generated: 7B quantized ONNX model memory occupied, time consumption and number of tokens generated 2. Find all pairwise different isomorphism groups with order 147 and no elements with order 49 1.5B quantized ONNX model memory occupied, time consumption and number of tokens generated: 7B quantized ONNX model memory occupied, time consumption and number of tokens generated Results of the numbers Through the test, we can see that the 1.5B model of DeepSeek is more suitable for use on CPU inference and can be deployed on traditional PCs or IoT devices. As for 7B, although it has better inference, it is not very effective on CPU operation. GPU Inference It is ideal if we have a GPU on the edge device. We can quantize and convert it to an ONNX model for CPU inference through Microsoft Olive. Of course, it can also be converted to a model for GPU inference. Here I take the 14B DeepSeek-R1-Distill-Qwen-14B as an example and make an inference comparison with Microsoft's Phi-4-14B Quantization conversion olive auto-opt --model_name_or_path <Your Phi-4-14B or DeepSeek-R1-Distill-Qwen-14B local path > --output_path <Your converted Phi-4-14B or DeepSeek-R1-Distill-Qwen-14B local path > --device gpu --provider CUDAExecutionProvider --precision int4 --use_model_builder --log_level 1 You can download it directly from my Hugging face Repo (Note: This model is for testing and has not been fully tested by AI Content Safety and not an Official Model) DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Phi-4-14B-ONNX-INT4-GPU Running with ONNX Runtime GenAI CUDA Install ONNX Runtime GenAI and ONNX Runtime GPU support libraries pip install onnxruntime-genai-cuda pip install onnxruntime-gpu Compare the results in the GPU environment with Gradio It is recommended to use a GPU with more than 8G memory To increase the comparison of the results, we compare it with Phi-4-14B-ONNX-INT4-GPU and DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU to see the different results. We also show we use OpenAI o1-mini (it is recommended to use o1-mini through GitHub Models), Sample Code https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/Performance_AdvancedReasoning_ONNX_CPU.ipynb You can test any prompt on Gradio to compare the results of Phi-4-14B-ONNX-INT4-GPU, DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU and OpenAI o1 mini. DeepSeek-R1 reduces the cost of inference models and produces more instructive results on professional problems, but Phi-4-14B also has advantages in reasoning and uses lower computing power to complete inference. As for OpenAI o1 mini, it is more comprehensive and can touch all problems. If you want to deploy to Edge Device, Phi-4-14B and quantized DeepSeek-R1 are good choices for you. This blog is just a simple test and the first in this series. Please share your feedback and continue the discussion in the Microsoft AI Discord Channel. Feel free to me a message or comment. We look forward to sharing more around the opportunity of EdgeAI and more content in this series. Resource DeepSeek-R1 in GitHub Models https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1 DeepSeek-R1 in Azure AI Foundry https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek Phi-4-14B in Hugging face https://huggingface.co/microsoft/phi-4 Learn about Microsoft Olive https://github.com/microsoft/olive Learn about ONNX Runtime GenAI https://github.com/microsoft/onnxruntime-genai Microsoft AI Discord Channel BRK453 Exploring cutting-edge models: LLMs, SLMs, local development and more https://aka.ms/aitour/brk453815Views0likes0Comments