openai
125 TopicsFine-Tuning and Deploying Phi-3.5 Model with Azure and AI Toolkit
What is Phi-3.5? Phi-3.5 as a state-of-the-art language model with strong multilingual capabilities. Emphasize that it is designed to handle multiple languages with high proficiency, making it a versatile tool for Natural Language Processing (NLP) tasks across different linguistic backgrounds. Key Features of Phi-3.5 Highlight the core features of the Phi-3.5 model: Multilingual Capabilities: Explain that the model supports a wide variety of languages, including major world languages such as English, Spanish, Chinese, French, and others. You can provide an example of its ability to handle a sentence or document translation from one language to another without losing context or meaning. Fine-Tuning Ability: Discuss how the model can be fine-tuned for specific use cases. For instance, in a customer support setting, the Phi-3.5 model can be fine-tuned to understand the nuances of different languages used by customers across the globe, improving response accuracy. High Performance in NLP Tasks: Phi-3.5 is optimized for tasks like text classification, machine translation, summarization, and more. It has superior performance in handling large-scale datasets and producing coherent, contextually correct language outputs. Applications in Real-World Scenarios To make this section more engaging, provide a few real-world applications where the Phi-3.5 model can be utilized: Customer Support Chatbots: For companies with global customer bases, the model’s multilingual support can enhance chatbot capabilities, allowing for real-time responses in a customer’s native language, no matter where they are located. Content Creation for Global Markets: Discuss how businesses can use Phi-3.5 to automatically generate or translate content for different regions. For example, marketing copy can be adapted to fit cultural and linguistic nuances in multiple languages. Document Summarization Across Languages: Highlight how the model can be used to summarize long documents or articles written in one language and then translate the summary into another language, improving access to information for non-native speakers. Why Choose Phi-3.5 for Your Project? End this section by emphasizing why someone should use Phi-3.5: Versatility: It’s not limited to just one or two languages but performs well across many. Customization: The ability to fine-tune it for particular use cases or industries makes it highly adaptable. Ease of Deployment: With tools like Azure ML and Ollama, deploying Phi-3.5 in the cloud or locally is accessible even for smaller teams. Objective Of Blog Specialized Language Models (SLMs) are at the forefront of advancements in Natural Language Processing, offering fine-tuned, high-performance solutions for specific tasks and languages. Among these, the Phi-3.5 model has emerged as a powerful tool, excelling in its multilingual capabilities. Whether you're working with English, Spanish, Mandarin, or any other major world language, Phi-3.5 offers robust, reliable language processing that adapts to various real-world applications. This makes it an ideal choice for businesses looking to deploy multilingual chatbots, automate content generation, or translate customer interactions in real time. Moreover, its fine-tuning ability allows for customization, making Phi-3.5 versatile across industries and tasks. Customization and Fine-Tuning for Different Applications The Phi-3.5 model is not just limited to general language understanding tasks. It can be fine-tuned for specific applications, industries, and language models, allowing users to tailor its performance to meet their needs. Customizable for Industry-Specific Use Cases: With fine-tuning, the model can be trained further on domain-specific data to handle particular use cases like legal document translation, medical records analysis, or technical support. Example: A healthcare company can fine-tune Phi-3.5 to understand medical terminology in multiple languages, enabling it to assist in processing patient records or generating multilingual health reports. Adapting for Specialized Tasks: You can train Phi-3.5 to perform specialized tasks like sentiment analysis, text summarization, or named entity recognition in specific languages. Fine-tuning helps enhance the model's ability to handle unique text formats or requirements. Example: A marketing team can fine-tune the model to analyse customer feedback in different languages to identify trends or sentiment across various regions. The model can quickly classify feedback as positive, negative, or neutral, even in less widely spoken languages like Arabic or Korean. Applications in Real-World Scenarios To illustrate the versatility of Phi-3.5, here are some real-world applications where this model excels, demonstrating its multilingual capabilities and customization potential: Case Study 1: Multilingual Customer Support Chatbots Many global companies rely on chatbots to handle customer queries in real-time. With Phi-3.5’s multilingual abilities, businesses can deploy a single model that understands and responds in multiple languages, cutting down on the need to create language-specific chatbots. Example: A global airline can use Phi-3.5 to power its customer service bot. Passengers from different countries can inquire about their flight status or baggage policies in their native languages—whether it's Japanese, Hindi, or Portuguese—and the model responds accurately in the appropriate language. Case Study 2: Multilingual Content Generation Phi-3.5 is also useful for businesses that need to generate content in different languages. For example, marketing campaigns often require creating region-specific ads or blog posts in multiple languages. Phi-3.5 can help automate this process by generating localized content that is not just translated but adapted to fit the cultural context of the target audience. Example: An international cosmetics brand can use Phi-3.5 to automatically generate product descriptions for different regions. Instead of merely translating a product description from English to Spanish, the model can tailor the description to fit cultural expectations, using language that resonates with Spanish-speaking audiences. Case Study 3: Document Translation and Summarization Phi-3.5 can be used to translate or summarize complex documents across languages. Its ability to preserve meaning and context across languages makes it ideal for industries where accuracy is crucial, such as legal or academic fields. Example: A legal firm working on cross-border cases can use Phi-3.5 to translate contracts or legal briefs from German to English, ensuring the context and legal terminology are accurately preserved. It can also summarize lengthy documents in multiple languages, saving time for legal teams. Fine-Tuning Phi-3.5 Model Fine-tuning a language model like Phi-3.5 is a crucial step in adapting it to perform specific tasks or cater to specific domains. This section will walk through what fine-tuning is, its importance in NLP, and how to fine-tune the Phi-3.5 model using Azure Model Catalog for different languages and tasks. We'll also explore a code example and best practices for evaluating and validating the fine-tuned model. What is Fine-Tuning? Fine-tuning refers to the process of taking a pre-trained model and adapting it to a specific task or dataset by training it further on domain-specific data. In the context of NLP, fine-tuning is often required to ensure that the language model understands the nuances of a particular language, industry-specific terminology, or a specific use case. Why Fine-Tuning is Necessary Pre-trained Large Language Models (LLMs) are trained on diverse datasets and can handle various tasks like text summarization, generation, and question answering. However, they may not perform optimally in specialized domains without fine-tuning. The goal of fine-tuning is to enhance the model's performance on specific tasks by leveraging its prior knowledge while adapting it to new contexts. Challenges of Fine-Tuning Resource Intensiveness: Fine-tuning large models can be computationally expensive, requiring significant hardware resources. Storage Costs: Each fine-tuned model can be large, leading to increased storage needs when deploying multiple models for different tasks. LoRA and QLoRA To address these challenges, techniques like LoRA (Low-rank Adaptation) and QLoRA (Quantized Low-rank Adaptation) have emerged. Both methods aim to make the fine-tuning process more efficient: LoRA: This technique reduces the number of trainable parameters by introducing low-rank matrices into the model while keeping the original model weights frozen. This approach minimizes memory usage and speeds up the fine-tuning process. QLoRA: An enhancement of LoRA, QLoRA incorporates quantization techniques to further reduce memory requirements and increase the efficiency of the fine-tuning process. It allows for the deployment of large models on consumer hardware without the extensive resource demands typically associated with full fine-tuning. from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments from peft import get_peft_model, LoraConfig # Load a pre-trained model model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") # Configure LoRA lora_config = LoraConfig( r=16, # Rank lora_alpha=32, lora_dropout=0.1, ) # Wrap the model with LoRA model = get_peft_model(model, lora_config) # Define training arguments training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=16, per_device_eval_batch_size=16, num_train_epochs=3, ) # Create a Trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, ) # Start fine-tuning trainer.train() This code outlines how to set up a model for fine-tuning using LoRA, which can significantly reduce the resource requirements while still adapting the model effectively to specific tasks. In summary, fine-tuning with methods like LoRA and QLoRA is essential for optimizing pre-trained models for specific applications in NLP, making it feasible to deploy these powerful models in various domains efficiently. Why is Fine-Tuning Important in NLP? Task-Specific Performance: Fine-tuning helps improve performance for tasks like text classification, machine translation, or sentiment analysis in specific domains (e.g., legal, healthcare). Language-Specific Adaptation: Since models like Phi-3.5 are trained on general datasets, fine-tuning helps them handle industry-specific jargon or linguistic quirks. Efficient Resource Utilization: Instead of training a model from scratch, fine-tuning leverages pre-trained knowledge, saving computational resources and time. Steps to Fine-Tune Phi-3.5 in Azure AI Foundry Fine-tuning the Phi-3.5 model in Azure AI Foundry involves several key steps. Azure provides a user-friendly interface to streamline model customization, allowing you to quickly configure, train, and deploy models. Step 1: Setting Up the Environment in Azure AI Foundry Access Azure AI Foundry: Log in to Azure AI Foundry. If you don’t have an account, you can create one and set up a workspace. Create a New Experiment: Once in the Azure AI Foundry, create a new training experiment. Choose the Phi-3.5 model from the pre-trained models provided in the Azure Model Zoo. Set Up the Data for Fine-Tuning: Upload your custom dataset for fine-tuning. Ensure the dataset is in a compatible format (e.g., CSV, JSON). For instance, if you are fine-tuning the model for a customer service chatbot, you could upload customer queries in different languages. Step 2: Configure Fine-Tuning Settings Select the Training Dataset: Select the dataset you uploaded and link it to the Phi-3.5 model. 2) Configure the Hyperparameters: Set up training hyperparameters like the number of epochs, learning rate, and batch size. You may need to experiment with these settings to achieve optimal performance. 3) Choose the Task Type: Specify the task you are fine-tuning for, such as text classification, translation, or summarization. This helps Azure AI Foundry understand how to optimize the model during fine-tuning. 4) Fine-Tuning for Specific Languages: If you are fine-tuning for a specific language or multilingual tasks, ensure that the dataset is labeled appropriately and contains enough examples in the target language(s). This will allow Phi-3.5 to learn language-specific features effectively. Step 3: Train the Model Launch the Training Process: Once the configuration is complete, launch the training process in Azure AI Foundry. Depending on the size of your dataset and the complexity of the model, this could take some time. Monitor Training Progress: Use Azure AI Foundry’s built-in monitoring tools to track performance metrics such as loss, accuracy, and F1 score. You can view the model’s progress during training to ensure that it is learning effectively. Code Example: Fine-Tuning Phi-3.5 for a Specific Use Case Here's a code snippet for fine-tuning the Phi-3.5 model using Python and Azure AI Foundry SDK. In this example, we are fine-tuning the model for a customer support chatbot in multiple languages. from azure.ai import Foundry from azure.ai.model import Model # Initialize Azure AI Foundry foundry = Foundry() # Load the Phi-3.5 model model = Model.load("phi-3.5") # Set up the training dataset training_data = foundry.load_dataset("customer_queries_dataset") # Fine-tune the model model.fine_tune(training_data, epochs=5, learning_rate=0.001) # Save the fine-tuned model model.save("fine_tuned_phi_3.5") Best Practices for Evaluating and Validating Fine-Tuned Models Once the model is fine-tuned, it's essential to evaluate and validate its performance before deploying it in production. Split Data for Validation: Always split your dataset into training and validation sets. This ensures that the model is evaluated on unseen data to prevent overfitting. Evaluate Key Metrics: Measure performance using key metrics such as: Accuracy: The proportion of correct predictions. F1 Score: A measure of precision and recall. Confusion Matrix: Helps visualize true vs. false predictions for classification tasks. Cross-Language Validation: If the model is fine-tuned for multiple languages, test its performance across all supported languages to ensure consistency and accuracy. Test in Production-Like Environments: Before full deployment, test the fine-tuned model in a production-like environment to catch any potential issues. Continuous Monitoring and Re-Fine-Tuning: Once deployed, continuously monitor the model’s performance and re-fine-tune it periodically as new data becomes available. Deploying Phi-3.5 Model After fine-tuning the Phi-3.5 model, the next crucial step is deploying it to make it accessible for real-world applications. This section will cover two key deployment strategies: deploying in Azure for cloud-based scaling and reliability, and deploying locally with AI Toolkit for simpler offline usage. Each deployment strategy offers its own advantages depending on the use case. Deploying in Azure Azure provides a powerful environment for deploying machine learning models at scale, enabling organizations to deploy models like Phi-3.5 with high availability, scalability, and robust security features. Azure AI Foundry simplifies the entire deployment pipeline. Set Up Azure AI Foundry Workspace: Log in to Azure AI Foundry and navigate to the workspace where the Phi-3.5 model was fine-tuned. Go to the Deployments section and create a new deployment environment for the model. Choose Compute Resources: Compute Target: Select a compute target suitable for your deployment. For large-scale usage, it’s advisable to choose a GPU-based compute instance. Example: Choose an Azure Kubernetes Service (AKS) cluster for handling large-scale requests efficiently. Configure Scaling Options: Azure allows you to set up auto-scaling based on traffic. This ensures that the model can handle surges in demand without affecting performance. Model Deployment Configuration: Create an Inference Pipeline: In Azure AI Foundry, set up an inference pipeline for your model. Specify the Model: Link the fine-tuned Phi-3.5 model to the deployment pipeline. Deploy the Model: Select the option to deploy the model to the chosen compute resource. Test the Deployment: Once the model is deployed, test the endpoint by sending sample requests to verify the predictions. Configuration Steps (Compute, Resources, Scaling) During deployment, Azure AI Foundry allows you to configure essential aspects like compute type, resource allocation, and scaling options. Compute Type: Choose between CPU or GPU clusters depending on the computational intensity of the model. Resource Allocation: Define the minimum and maximum resources to be allocated for the deployment. For real-time applications, use Azure Kubernetes Service (AKS) for high availability. For batch inference, Azure Container Instances (ACI) is suitable. Auto-Scaling: Set up automatic scaling of the compute instances based on the number of requests. For example, configure the deployment to start with 1 node and scale to 10 nodes during peak usage. Cost Comparison: Phi-3.5 vs. Larger Language Models When comparing the costs of using Phi-3.5 with larger language models (LLMs), several factors come into play, including computational resources, pricing structures, and performance efficiency. Here’s a breakdown: Cost Efficiency Phi-3.5: Designed as a Small Language Model (SLM), Phi-3.5 is optimized for lower computational costs. It offers competitive performance at a fraction of the cost of larger models, making it suitable for budget-conscious projects. The smaller size (3.8 billion parameters) allows for reduced resource consumption during both training and inference. Larger Language Models (e.g., GPT-3.5): Typically require more computational resources, leading to higher operational costs. Larger models may incur additional costs for storage and processing power, especially in cloud environments. Performance vs. Cost Performance Parity: Phi-3.5 has been shown to achieve performance parity with larger models on various benchmarks, including language comprehension and reasoning tasks. This means that for many applications, Phi-3.5 can deliver similar results to larger models without the associated costs. Use Case Suitability: For simpler tasks or applications that do not require extensive factual knowledge, Phi-3.5 is often the more cost-effective choice. Larger models may still be preferred for complex tasks requiring deep contextual understanding or extensive factual recall. Pricing Structure Azure Pricing: Phi-3.5 is available through Azure with a pay-as-you-go billing model, allowing users to scale costs based on usage. Pricing details for Phi-3.5 can be found on the Azure pricing page, where users can customize options based on their needs. Code Example: API Setup and Endpoints for Live Interaction Below is a Python code snippet demonstrating how to interact with a deployed Phi-3.5 model via an API in Azure: import requests # Define the API endpoint and your API key api_url = "https://<your-azure-endpoint>/predict" api_key = "YOUR_API_KEY" # Prepare the input data input_data = { "text": "What are the benefits of renewable energy?" } # Make the API request response = requests.post(api_url, json=input_data, headers={"Authorization": f"Bearer {api_key}"}) # Print the model's response if response.status_code == 200: print("Model Response:", response.json()) else: print("Error:", response.status_code, response.text) Deploying Locally with AI Toolkit For developers who prefer to run models on their local machines, the AI Toolkit provides a convenient solution. The AI Toolkit is a lightweight platform that simplifies local deployment of AI models, allowing for offline usage, experimentation, and rapid prototyping. Deploying the Phi-3.5 model locally using the AI Toolkit is straightforward and can be used for personal projects, testing, or scenarios where cloud access is limited. Introduction to AI Toolkit The AI Toolkit is an easy-to-use platform for deploying language models locally without relying on cloud infrastructure. It supports a range of AI models and enables developers to work in a low-latency environment. Advantages of deploying locally with AI Toolkit: Offline Capability: No need for continuous internet access. Quick Experimentation: Rapid prototyping and testing without the delays of cloud deployments. Setup Guide: Installing and Running Phi-3.5 Locally Using AI Toolkit Install AI Toolkit: Go to the AI Toolkit website and download the platform for your operating system (Linux, macOS, or Windows). Install AI Toolkit by running the appropriate installation command in your terminal. Download the Phi-3.5 Model: Once AI Toolkit is installed, you can download the Phi-3.5 model locally by running: 3. Run the Model Locally: After downloading the model, start a local session by running: This will launch a local server on your machine where the model will be available for interaction. Code Example: Using Phi-3.5 Locally in a Project Below is a Python code example demonstrating how to send a query to the locally deployed Phi-3.5 model running on the AI Toolkit. import requests # Define the local endpoint local_url = "http://localhost:8000/predict" # Prepare the input data input_data = { "text": "What are the benefits of renewable energy?" } # Make the API request response = requests.post(local_url, json=input_data) # Print the model's response if response.status_code == 200: print("Model Response:", response.json()) else: print("Error:", response.status_code, response.text) Comparing Language Capabilities Test Results: How Phi-3.5 Handles Different Languages The Phi-3.5 model demonstrates robust multilingual capabilities, effectively processing and generating text in various languages. Below are comparative examples showcasing its performance in English, Spanish, and Mandarin: English Example: Input: "What are the benefits of renewable energy?" Output: "Renewable energy sources, such as solar and wind, reduce greenhouse gas emissions and promote sustainability." Spanish Example: Input: "¿Cuáles son los beneficios de la energía renovable?" Output: "Las fuentes de energía renovable, como la solar y la eólica, reducen las emisiones de gases de efecto invernadero y promueven la sostenibilidad." Mandarin Example: Input: "可再生能源的好处是什么?" Output: "可再生能源,如太阳能和风能,减少温室气体排放,促进可持续发展。" Performance Benchmarking and Evaluation Across Different Languages Benchmarking Phi-3.5 across different languages involves evaluating its accuracy, fluency, and contextual understanding. For instance, using BLEU scores and human evaluations, the model can be assessed on its translation quality and coherence in various languages. Real-World Use Case: Multilingual Customer Service Chatbot A practical application of Phi-3.5's multilingual capabilities is in developing a customer service chatbot that can interact with users in their preferred language. For instance, the chatbot could provide support in English, Spanish, and Mandarin, ensuring a wider reach and better user experience. Optimizing and Validating Phi-3.5 Model Model Performance Metrics To validate the model's performance in different scenarios, consider the following metrics: Accuracy: Measure how often the model's outputs are correct or align with expected results. Fluency: Assess the naturalness and readability of the generated text. Contextual Understanding: Evaluate how well the model understands and responds to context-specific queries. Tools to Use in Azure and Ollama for Evaluation Azure Cognitive Services: Utilize tools like Text Analytics and Translator to evaluate performance. Ollama: Use local testing environments to quickly iterate and validate model outputs. Conclusion In summary, Phi-3.5 exhibits impressive multilingual capabilities, effective deployment options, and robust performance metrics. Its ability to handle various languages makes it a versatile tool for natural language processing applications. Phi-3.5 stands out for its adaptability and performance in multilingual contexts, making it an excellent choice for future NLP projects, especially those requiring diverse language support. We encourage readers to experiment with the Phi-3.5 model using Azure AI Foundry or the AI Toolkit, explore fine-tuning techniques for their specific use cases, and share their findings with the community. For more information on optimized fine-tuning techniques, check out the Ignite Fine-Tuning Workshop. References Customize the Phi-3.5 family of models with LoRA fine-tuning in Azure Fine-tune Phi-3.5 models in Azure Fine Tuning with Azure AI Foundry and Microsoft Olive Hands on Labs and Workshop Customize a model with fine-tuning https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=azure-openai%2Cturbo%2Cpython-new&pivots=programming-language-studio Microsoft AI Toolkit - AI Toolkit for VSCode1.8KViews1like2CommentsUnleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration
Artificial Intelligence is evolving rapidly, and one of the most pressing challenges is enabling AI models to interact effectively with external tools, data sources, and APIs. The Model Context Protocol (MCP) solves this problem by acting as a bridge between AI models and external services, creating a standardized communication framework that enhances tool integration, accessibility, and AI reasoning capabilities. What is Model Context Protocol (MCP)? MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge. Key Features of MCP Standardized Communication – MCP provides a structured way for AI models to interact with various tools. Tool Access & Expansion – AI assistants can now utilize external tools for real-time insights. Secure & Scalable – Enables safe and scalable integration with enterprise applications. Multi-Modal Integration – Supports STDIO, SSE (Server-Sent Events), and WebSocket communication methods. MCP Architecture & How It Works MCP follows a client-server architecture that allows AI models to interact with external tools efficiently. Here’s how it works: Components of MCP MCP Host – The AI model (e.g., Azure OpenAI GPT) requesting data or actions. MCP Client – An intermediary service that forwards the AI model's requests to MCP servers. MCP Server – Lightweight applications that expose specific capabilities (APIs, databases, files, etc.). Data Sources – Various backend systems, including local storage, cloud databases, and external APIs. Data Flow in MCP The AI model sends a request (e.g., "fetch user profile data"). The MCP client forwards the request to the appropriate MCP server. The MCP server retrieves the required data from a database or API. The response is sent back to the AI model via the MCP client. Integrating MCP with Azure OpenAI Services Microsoft has integrated MCP with Azure OpenAI Services, allowing GPT models to interact with external services and fetch live data. This means AI models are no longer limited to static knowledge but can access real-time information. Benefits of Azure OpenAI Services + MCP Integration ✔ Real-time Data Fetching – AI assistants can retrieve fresh information from APIs, databases, and internal systems. ✔ Contextual AI Responses – Enhances AI responses by providing accurate, up-to-date information. ✔ Enterprise-Ready – Secure and scalable for business applications, including finance, healthcare, and retail. Hands-On Tools for MCP Implementation To implement MCP effectively, Microsoft provides two powerful tools: Semantic Workbench and AI Gateway. Microsoft Semantic Workbench A development environment for prototyping AI-powered assistants and integrating MCP-based functionalities. Features: Build and test multi-agent AI assistants. Configure settings and interactions between AI models and external tools. Supports GitHub Codespaces for cloud-based development. Explore Semantic Workbench Workbench interface examples Microsoft AI Gateway A plug-and-play interface that allows developers to experiment with MCP using Azure API Management. Features: Credential Manager – Securely handle API credentials. Live Experimentation – Test AI model interactions with external tools. Pre-built Labs – Hands-on learning for developers. Explore AI Gateway Setting Up MCP with Azure OpenAI Services Step 1: Create a Virtual Environment First, create a virtual environment using Python: python -m venv .venv Activate the environment: # Windows venv\Scripts\activate # MacOS/Linux source .venv/bin/activate Step 2: Install Required Libraries Create a requirements.txt file and add the following dependencies: langchain-mcp-adapters langgraph langchain-openai Then, install the required libraries: pip install -r requirements.txt Step 3: Set Up OpenAI API Key Ensure you have your OpenAI API key set up: # Windows setx OPENAI_API_KEY "<your_api_key> # MacOS/Linux export OPENAI_API_KEY=<your_api_key> Building an MCP Server This server performs basic mathematical operations like addition and multiplication. Create the Server File First, create a new Python file: touch math_server.py Then, implement the server: from mcp.server.fastmcp import FastMCP # Initialize the server mcp = FastMCP("Math") MCP.tool() def add(a: int, b: int) -> int: return a + b MCP.tool() def multiply(a: int, b: int) -> int: return a * b if __name__ == "__main__": mcp.run(transport="stdio") Your MCP server is now ready to run. Building an MCP Client This client connects to the MCP server and interacts with it. Create the Client File First, create a new file: touch client.py Then, implement the client: import asyncio from mcp import ClientSession, StdioServerParameters from langchain_openai import ChatOpenAI from mcp.client.stdio import stdio_client # Define server parameters server_params = StdioServerParameters( command="python", args=["math_server.py"], ) # Define the model model = ChatOpenAI(model="gpt-4o") async def run_agent(): async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await load_mcp_tools(session) agent = create_react_agent(model, tools) agent_response = await agent.ainvoke({"messages": "what's (4 + 6) x 14?"}) return agent_response["messages"][3].content if __name__ == "__main__": result = asyncio.run(run_agent()) print(result) Your client is now set up and ready to interact with the MCP server. Running the MCP Server and Client Step 1: Start the MCP Server Open a terminal and run: python math_server.py This starts the MCP server, making it available for client connections. Step 2: Run the MCP Client In another terminal, run: python client.py Expected Output 140 This means the AI agent correctly computed (4 + 6) x 14 using both the MCP server and GPT-4o. Conclusion Integrating MCP with Azure OpenAI Services enables AI applications to securely interact with external tools, enhancing functionality beyond text-based responses. With standardized communication and improved AI capabilities, developers can build smarter and more interactive AI-powered solutions. By following this guide, you can set up an MCP server and client, unlocking the full potential of AI with structured external interactions. Next Steps: Explore more MCP tools and integrations. Extend your MCP setup to work with additional APIs. Deploy your solution in a cloud environment for broader accessibility. For further details, visit the GitHub repository for MCP integration examples and best practices. MCP GitHub Repository MCP Documentation Semantic Workbench AI Gateway MCP Video Walkthrough MCP Blog MCP Github End to End Demo62KViews11likes6CommentsUnderstanding Azure OpenAI Service Quotas and Limits: A Beginner-Friendly Guide
Azure OpenAI Service allows developers, researchers, and students to integrate powerful AI models like GPT-4, GPT-3.5, and DALL·E into their applications. But with great power comes great responsibility and limits. Before you dive into building your next AI-powered solution, it's crucial to understand how quotas and limits work in the Azure OpenAI ecosystem. This guide is designed to help students and beginners easily understand the concept of quotas, limits, and how to manage them effectively. What Are Quotas and Limits? Think of Azure's quotas as your "AI data pack." It defines how much you can use the service. Meanwhile, limits are hard boundaries set by Azure to ensure fair use and system stability. Quota The maximum number of resources (e.g., tokens, requests) allocated to your Azure subscription. Limit The technical cap imposed by Azure on specific resources (e.g., number of files, deployments). Key Metrics: TPM & RPM Tokens Per Minute (TPM) TPM refers to how many tokens you can use per minute across all your requests in each region. A token is a chunk of text. For example, the word "Hello" is 1 token, but "Understanding" might be 2 tokens. Each model has its own default TPM. Example: GPT-4 might allow 240,000 tokens per minute. You can split this quota across multiple deployments. Requests Per Minute (RPM) RPM defines how many API requests you can make every minute. For instance, GPT-3.5-turbo might allow 350 RPM. DALL·E image generation models might allow 6 RPM. Deployment, File, and Training Limits Here are some standard limits imposed on your OpenAI resource: Resource Type Limit Standard model deployments 32 Fine-tuned model deployments 5 Training jobs 100 total per resource (1 active at a time) Fine-tuning files 50 files (total size: 1 GB) Max prompt tokens per request Varies by model (e.g., 4096 tokens for GPT-3.5) How to View and Manage Your Quota Step-by-Step: Go to the Azure Portal. Navigate to your Azure OpenAI resource. Click on "Usage + quotas" in the left-hand menu. You will see TPM, RPM, and current usage status. To Request More Quota: In the same "Usage + quotas" panel, click on "Request quota increase". Fill in the form: Select the region. Choose the model family (e.g., GPT-4, GPT-3.5). Enter the desired TPM and RPM values. Submit and wait for Azure to review and approve. What is Dynamic Quota? Sometimes, Azure gives you extra quota based on demand and availability. “Dynamic quota” is not guaranteed and may increase or decrease. Useful for short-term spikes but should not be relied on for production apps. Example: During weekends, your GPT-3.5 TPM may temporarily increase if there's less traffic in your region. Best Practices for Students Monitor Regularly: Use the Azure Portal to keep an eye on your usage. Batch Requests: Combine multiple tasks in one API call to save tokens. Start Small: Begin with GPT-3.5 before requesting GPT-4 access. Plan Ahead: If you're preparing a demo or a project, request quota in advance. Handle Limits Gracefully: Code should manage 429 Too Many Requests errors. Quick Resources Azure OpenAI Quotas and Limits How to Request Quota in Azure Join the Conversation on Azure AI Foundry Discussions! Have ideas, questions, or insights about AI? Don't keep them to yourself! Share your thoughts, engage with experts, and connect with a community that’s shaping the future of artificial intelligence. 🧠✨ 👉 Click here to join the discussion!2.8KViews0likes0CommentsGetting Started with the AI Toolkit: A Beginner’s Guide with Demos and Resources
If you're curious about building AI solutions but don’t know where to start, Microsoft’s AI Toolkit is a great place to begin. Whether you’re a student, developer, or just someone exploring AI for the first time, this toolkit helps you build real-world solutions using Microsoft’s powerful AI services. In this blog, I’ll Walk you through what the AI Toolkit is, how you can get started, and where you can find helpful demos and ready-to-use code samples. What is the AI Toolkit? The AI Toolkit is a collection of tools, templates, and sample apps that make it easier to build AI-powered applications and copilots using Microsoft Azure. With the AI Toolkit, you can: Build intelligent apps without needing deep AI expertise. Use templates and guides that show you how everything works. Quickly prototype and deploy apps with natural language, speech, search, and more. Watch the AI Toolkit in Action Microsoft has created a video playlist that covers the AI Toolkit and shows you how to build apps step-by-step. You can watch the full playlist here: It is especially useful for developers who want to bring AI into their projects, but also for beginners who want to learn by doing. AI Toolkit Playlist – https://aka.ms/AIToolkit/videos These videos help you understand the flow of building AI agents, using Azure OpenAI, and other cognitive services in a hands-on way. Explore Sample Projects on GitHub Microsoft also provides a public GitHub repository where you can find real code examples built using the AI Toolkit. Here’s the GitHub repo: AI Toolkit Samples – https://github.com/Azure-Samples/AI_Toolkit_Samples This repository includes: Sample apps using Azure AI services like OpenAI, Cognitive Search, and Speech. Instructions to deploy apps using Azure. Code that you can clone, test, and build on top of. You don’t have to start from scratch just open the code, understand the structure, and make small edits to experiment. How to Get Started Here’s a simple path if you’re just starting: Watch 2 or 3 videos from the AI Toolkit Playlist. Go to the GitHub repository and try running one of the examples. Make small changes to the code (like updating the prompt or output). Try deploying the solution on Azure by following the guide in the repo. Keep building and learning. Why This Toolkit is Worth Exploring As someone who is also learning and experimenting, I found this toolkit to be: Easy to understand, even for beginners. Focused on real-world applications, not just theory. Helpful for building responsible AI solutions with good documentation. It gives a complete picture — from writing code to deploying apps. Final Thoughts The AI Toolkit helps you start your journey in AI without feeling overwhelmed. It provides real code, real use cases, and practical demos. With the support of Microsoft Learn and Azure samples, you can go from learning to building in no time. If you’re serious about building with AI, this is a resource worth exploring. Continue the discussion in the Azure AI Foundry Discord community at Https://aka.ms/AI/discord Join the Azure AI Foundry Discord Server! References AI Toolkit Playlist (YouTube) https://aka.ms/AIToolkit/videos AI Toolkit GitHub Repository https://github.com/Azure-Samples/AI_Toolkit_Samples Microsoft Learn: AI Toolkit Documentation https://learn.microsoft.com/en-us/azure/ai-services/toolkit/ Azure AI Services https://azure.microsoft.com/en-us/products/ai-services/1.8KViews0likes0CommentsRun Javascript code on Agent Loop
We have recently introduced support for Code interpreters inside of Azure Logic Apps Agent Loop, extending the support we had for Python. When partnered with a LLM, this allow builders to express their goals or intents via natural language and obtain executable results. These capabilities become powerful in the areas of data analysis, visualizations, validations and transformations. Our first language supported for code interpreter is JavaScript, with other languages following later. Historically, customers have had concerns about an LLM performing data analysis, calculations and transformations due to context window exhaustion which can lead to hallucinations. Code interpreters help in this regard as they can perform this analysis without filling up context windows and providing more reliable results. You can see the code interpreter with JavaScript in action in this video from Kent Weare. After watching the video, you can deep dive in the details. How it works When Agent Loop evaluates code generated by an AI agent (for example, through a code interpreter), we run it inside a V8 isolate using the isolated‑vm library. V8 is the JavaScript engine that powers Node.js and Chrome—it’s what actually executes JavaScript code. An isolate is a lightweight, independent environment within V8, with its own memory and execution context. Running code inside an isolate gives us strong separation from the host runtime. Each execution has its own memory (“heap”) and cannot directly access the host’s memory, file system, or network. This helps ensure that agent-generated code stays contained and doesn’t interfere with the rest of the system. This approach is not intended to be a full security sandbox, and we don’t treat it as safe for fully untrusted code. However, it provides meaningful defense-in-depth: Memory usage is limited per isolate, preventing a single execution from consuming all available resources Execution can be bounded with timeouts, allowing us to terminate long-running or infinite loops Failures are isolated, so crashes in agent-generated code won’t bring down the runtime process In practice, this is about reducing blast radius. By isolating execution and enforcing limits, we make sure that code—regardless of whether it’s generated by a user or an AI agent—cannot disrupt the engine that runs it. Use case: Expense Validations To help illustrate, this capability, let’s take an accounts payable example built in Logic Apps Standard. Zava uses a 3 rd party expense application to capture employee expenses. The 3rd party expense application will export transactions in CSV format. Zava has some very specific business validations that need to execute before the expenses can be processed by the ERP. To solve this problem, we will build an agentic business process in Logic Apps that includes our new JavaScript code interpreter. Our code interpreter will be able to ingest and parse our CSV file and then apply our business validations for us. The outcome is a report that identifies both valid and invalid transactions. Prior to uploading to the ERP (Dataverse), we will route our request to a human in the loop process for their oversight. This allows for additional control as unwinding in an ERP is always a tedious task. Below, is a picture of our solution. Within it we can see both deterministic steps before and after our Agent action. Within our agent action, we have tools that will help our agent address our company objectives. These tools include calling a batch API to upload valid expense records to Dataverse. Another tool that will take care of uploading invalid records to a different table, our human in the loop action to seek approval from our human stakeholder and a tool that will help us obtain business knowledge from SharePoint. You might be asking, ok where does the code interpreter come in? Within our Agent action, we will discover a toggle that allows you to enable it. The code interpreter gets invoked based upon instruction in the model. Here is a subset of the prompt from this workflow that describes how to invoke the code interpreter. For example: ### Step 2 -- Parse and Validate The expense CSV data is available from the Get_file_content action. Use code interpreter to parse ALL rows from the CSV. For each row, normalize: Category: title case - Amount: decimal number - SubmittedDate: ISO 8601 format (e.g. "2026-01-05T00:00:00Z") - ReceiptAttached: convert "Yes"/"No" to true/false Then apply the business rules from Step 1 to classify every record as VALID or INVALID. You won’t see the code interpreter modelled as a tool within our agent action, but we see the execution outcome within our run history. In the following screenshot we can see this illustrated. Within our agent action, we can see that we are on our 4 th turn and we have executed the code interpreter action. In the code window, we can see the code that was generated for our us. This is the result of the LLM working together with the code interpreter to generate and execute this code. Note: In this scenario, we are dynamically generating this code at runtime. This allows for ultimate flexibility if we have different source inputs and we are relying upon the LLM and code interpreter to adapt to these fluid inputs. If we were interested in a more deterministic approach we can also pass pre-written code into this action where it can also execute. This will result in less flexibility, but more deterministic behavior. Running JavaScript code in Logic Apps Consumption Agent Loop Logic Apps Consumption has a slightly different architecture to Logic Apps Standard. In Logic Apps Standard, we offer dedicated compute and storage for customers which provides workload isolation across customers. When it comes to Logic Apps Consumption, we provide a multi-tenant offering allowing customers to take advantage of a lower price point due to shared resource utilization. In order to allow customer isolation, customers need to have an integration account attached to their consumption workflow. This will allow the code interpreter to run in isolated compute thus avoiding any potential disruptions to other customers. You can provision an Integration Account by searching for Integration Accounts at the top of the Azure portal. You can select any of the SKUs available, including the Free SKU for non-production/non-SLA scenarios. With an Integration Account created, we can associate this Integration Account with our consumption logic app by clicking on Settings – Integration Account.571Views0likes0CommentsSigning in to Microsoft Foundry from OpenClaw using Azure AD: a smoother way to bring your models in
This post is a quick update to walk through the new flow. If you read the previous one, think of this as the easier path I wish I had the first time round. If you have not seen the original, you can find it here: Integrating Microsoft Foundry with OpenClaw: Step by Step Model Configuration | Microsoft Community Hub Pre-requisite: You will need the Azure CLI (azure-cli) installed on your machine. The official install guide for Linux is here: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux?view=azure-cli-latest I am on Linux so I went the Homebrew route, which keeps things simple. The formula is here: https://formulae.brew.sh/formula/azure-cli Microsoft also has official docs covering the Homebrew/Linuxbrew install: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos?view=azure-cli-latest#install-with-homebrew Once Homebrew is ready, run this in your terminal: brew install azure-cli Why this matters: Before this update, every Foundry model you wanted to use in OpenClaw needed its own API key and endpoint pasted into the config. It worked, but it was tedious, and keys are easy to leak if you are copying them around. The Azure AD path solves both problems. You authenticate as yourself (or a service principal), OpenClaw asks Azure for the list of Foundry resources you have access to, and it brings the models in automatically. Signing in to Microsoft Foundry from OpenClaw via Azure AD A device-code OAuth handshake replaces the old static-API-key flow. OpenClaw delegates auth to the local Azure CLI; the CLI handles the browser-side sign-in, holds the resulting tokens, and refreshes them silently. OpenClaw then walks the Azure resource graph, subscriptions → Foundry resources → model deployments and registers each model into its own config. No API keys move through OpenClaw at any point. Sequence diagram of the OAuth 2.0 device-authorization flow as orchestrated by OpenClaw. Phases 1–3 establish identity (the developer authenticates once, in a real browser, against Azure AD). Phases 4–5 perform service discovery (OpenClaw walks the ARM resource hierarchy, subscriptions → Foundry accounts → model deployments and persists the result to a local provider config). After registration, every model call OpenClaw makes against Foundry reuses the same Azure-CLI-managed token cache: tokens refresh transparently, and access is gated by the Foundry resource's RBAC assignments rather than a static API key. Dashed lines denote return values; the teal line in step 7 marks the single token-issuance event the rest of the system pivots on. Walking through the new flow: Start with the command to onboard openclaw as if you were setting up OpenClaw for the first time: openclaw onboard Kick things off with the OpenClaw onboard command, the same one you would use when setting up OpenClaw for the first time. When it prompts you, choose update values. Next, you will be asked to configure your models. Scroll down a little and you will see Microsoft Foundry listed as a supported provider. Pick it. From here, you have two options. You can sign in with an API key, which is what I covered in the previous blog post, or you can sign in through Azure AD. The Azure AD path is easier and more secure, so that is the one we will use. OpenClaw will give you a URL and a device code. Copy the URL into your browser and use the code to complete the sign in. (This is where the az CLI from the pre-requisite section earns its keep.) If everything worked, you should see a success prompt similar to this: Once you are signed in, OpenClaw will ask you to pick the Azure subscription that your Microsoft Foundry resource lives in. Pick the subscription, then pick the Foundry resource where your models are deployed. And that is pretty much it. All the models you have deployed to that Foundry resource get pulled into OpenClaw automatically. Compared to the old way of pasting API keys and endpoints one by one, this is a huge time saver, and you do not have to babysit any keys. From here you can start using your Foundry-deployed models inside OpenClaw straight away: Wrapping up The Azure AD sign-in option in OpenClaw is one of those small updates that quietly removes a real pain point. If you have ever juggled multiple Foundry endpoints and rotated keys across them, you already know why. With this flow, you sign in once, your models show up, and you can get back to actually building. If you have not tried OpenClaw with Microsoft Foundry yet, this is a good time to give it a go. And if you were holding off because of the key management overhead, that excuse is gone now. References Previous post on integrating Microsoft Foundry with OpenClaw using API keys: Integrating Microsoft Foundry with OpenClaw: Step by Step Model Configuration | Microsoft Community Hub Install the Azure CLI on Linux: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux?view=azure-cli-latest Install the Azure CLI on macOS: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos?view=azure-cli-latest#install-with-homebrew Homebrew formula for azure-cli: https://formulae.brew.sh/formula/azure-cli139Views0likes0CommentsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.2.3KViews3likes2CommentsIntroducing Skills in Azure API Center
The problem Modern applications depend on more than APIs. A single AI workflow might call an LLM, invoke an MCP tool, integrate a third-party service, and reference a business capability spanning dozens of endpoints. Without a central inventory, these assets become impossible to discover, easy to duplicate, and painful to govern. Azure API Center — part of the Azure API Management platform — already catalogs models, agents, and MCP servers alongside traditional APIs. Skills extend that foundation to cover reusable AI capabilities. What is a Skill? As AI agents become more capable, organizations need a way to define and govern what those agents can actually do. Skills are the answer. A Skill in Azure API Center is a reusable, registered capability that AI agents can discover and consume to extend their functionality. Each skill is backed by source code — typically hosted in a Git repository — and describes what it does, what APIs or MCP servers it can access, and who owns it. Think of skills as the building blocks of AI agent behavior, promoted into a governed inventory alongside your APIs, MCP servers, models, and agents. Example: A "Code Review Skill" performs automated code reviews using static analysis. It is registered in API Center with a Source URL pointing to its GitHub repo, allowed to access your code analysis API, and discoverable by any AI agent in your organization. How Skills work in API Center Skills can be added to your inventory in two ways: registered manually through the Azure portal, or synchronized automatically from a connected Git repository. Both approaches end up in the same governed catalog, discoverable through the API Center portal. Option 1: Register a Skill manually Use the Azure portal to register a skill directly. Navigate to Inventory > Assets in your API center, select + Register an asset > Skill, and fill in the registration form. Figure 2: Register a skill form in the Azure portal. The form captures everything needed to make a skill discoverable and governable: Field Description Title Display name for the skill (e.g. Code Review Skill). Identification Auto-generated URL slug based on the title. Editable. Summary One-line description of what the skill does. Description Full detail on capabilities, use cases, and expected behavior. Lifecycle stage Current state: Design, Preview, Production, or Deprecated. Source URL Git repository URL for the skill source code. Allowed tools The APIs or MCP servers from your inventory this skill is permitted to access. Enforces governance at the capability level. License Licensing terms: MIT, Apache 2.0, Proprietary, etc. Contact information Owner or support contact for the skill. Governance note: The Allowed tools field is key for AI governance. It explicitly defines which APIs and MCP servers a skill can invoke — preventing uncontrolled access and making security review straightforward. Option 2: Sync Skills from a Git repository For teams managing skills in source control, API Center can integrate directly with a Git repository and synchronize skill information automatically. This is the recommended approach for teams practicing GitOps or managing many skills at scale. Figure 3: Integrating a Git repository to sync skills automatically into API Center. When you configure a Git integration, API Center: Creates an Environment representing the repository as a source of skills Scans for files matching the configured pattern (default: **/skill.md) Syncs matching skills into your inventory and keeps them current as the repo changes For private repositories, a Personal Access Token (PAT) stored in Azure Key Vault is used for authentication. API Center's managed identity retrieves the PAT securely — no credentials are stored in the service itself. Tip: Use the Automatically configure managed identity and assign permissions option when setting up the integration if you haven't pre-configured a managed identity. API Center handles the Key Vault permissions automatically. Discovering Skills in your catalog Once registered — manually or via Git — skills appear in the Inventory > Assets page alongside all other asset types. Linked skills (synced from Git) are visually identified with a link icon, so teams can see at a glance which skills are source-controlled. From the API Center portal, developers and other stakeholders can browse the full skill catalog, filter by lifecycle stage or type, and view detailed information about each skill — including its source URL, allowed tools, and contact information. Figure 4: Skills catalog in API Center portal, showing registered skills and the details related to the skill. Developer experience: The API Center portal gives teams a self-service way to discover approved skills without needing to ask around or search GitHub. The catalog becomes the authoritative source of what's available and what's allowed. Why this matters for AI development teams Skills close a critical gap in AI governance. As organizations deploy AI agents, they need to know — and control — what those agents can do. Without a governed skill registry, capability discovery is ad hoc, reuse is low, and security review is difficult. By bringing skills into Azure API Center alongside APIs, MCP servers, models, and agents, teams get: A single inventory for all the assets AI agents depend on Explicit governance over which resources each skill can access via Allowed tools Automated, source-controlled skill registration via Git integration Discoverability for developers and AI systems through the API Center portal Consistent lifecycle management — Design through Production to Deprecated API Center, as part of the Azure API Management platform and the broader AI Gateway vision, is evolving into the system of record for AI-ready development. Skills are the latest step in that direction. Available now Skills are available today in Azure API Center (preview). To register your first skill: Sign in to the Azure portal and navigate to your API Center instance In the sidebar, select Inventory > Assets Select + Register an asset > Skill Fill in the registration form and select Create → Register and discover skills in Azure API Center (docs) → Set up your API Center portal → Explore the Azure API Management platform2.1KViews0likes2CommentsProvePresent: Ending Proxy Attendance with Azure Serverless & Azure OpenAI
Problem Most schools use a smart‑card‑based attendance system where students tap their cards on a reader. However, this method is unreliable because students can give their cards to friends or simply tap and leave immediately. Teachers cannot accurately assess real student performance—whether high‑performing students are genuinely attending class or whether poor performance is due to actual absence. Another issue is that even if students are physically present in a lecture, teachers still cannot tell whether they are paying attention to the projector or actually learning. The current workaround is for teachers to override the attendance record by calling each student one by one, which is time‑consuming in large lectures and adds little educational value. It is also only a one‑time check, meaning students can still leave the lecture room immediately afterwards. Another issue is that we have many out‑of‑school activities such as site visit, and the school needs to ensure everyone’s presence promptly in each check point. This kind of problem isn’t unique to schools. It’s a common challenge for event organizers, where verifying attendee presence is essential but often slow, causing long queues. Organizers usually rely on a few mobile scanners to check in attendees one by one. Solution ProvePresent is an AI tool designed to verify attendance and create real‑time challenges for participants, ensuring that attendance records are authentic and that attendees remain focused on the presentation. It uses OTP login with school email. Check-in and Check-out With a Real‑time QR Code The code refreshes every 25 seconds, and the presenter can display it on the projector for everyone to scan when checking in at the beginning and checking out at the end of the session. However, this alone cannot prevent someone from capturing the code and sending it to others who are not in the room, or from using two devices to help someone else scan for attendance—even if geolocation checks are enabled. We will explain this next. This check‑in and check‑out process is highly scalable, and no one needs to queue while waiting for someone to scan their QR code! Organizers can set geolocation restrictions to prevent anyone from checking in remotely in a simple manner. Keep Attendee Alive with Signalr The SignalR live connection allows the presenter to create real‑time challenges for attendees, helping to verify their presence and ensure they are genuinely focused on the presentation. AI Powered Live Quiz The presenter shares their presentation screen, and two Microsoft Foundry agents with Azure OpenAI Chatgpt 5.3 —ImageAnalysisAgent, which extracts key information from the shared screen, and QuizQuestionGenerator, which generates simple questions based on the current slide—work together to create challenges. The question is broadcast to all online attendees, who must answer within 20 seconds. This feature keeps attendees on the webpage and prevents them from doing anything unrelated to the presentation. Detailed report can be downloaded for further analysis. Attendee Photo Capture Request all online students to capture and upload photos of their venue view. The system will analyze the images to estimate seating positions using Microsoft Foundry agents with Azure OpenAI ChatGPT 5.3 PositionEstimationAgent and complete an image challenge. When the presenter clicks Capture Attendee Photos, all online attendees are prompted to take a photo and upload it to blob storage. The PositionEstimationAgent then analyzes the image to estimate their seating location, which can provide insights into student performance. Analysis Notes: Analyzed 13 students in 2 overlapping batches. Batch 1: The venue is a computer lab with the projector screen at the front center, whiteboards on the left, and cabinets on the right. Relative depth was estimated mainly from screen size and number of monitor rows visible ahead. Column estimates were inferred from screen angle and side-room features, with lower confidence for the rotated side-view image. Batch 2: These six photos appear to come from the same computer lab with the projector at the front center. Relative depth was estimated mainly from projector size and number of visible desk/monitor rows ahead. Left-right placement was inferred from projector skew and side-wall visibility. Within this batch, 240124734 and 240167285 seem closest to the front, 240286514 and 240158424 are slightly farther back, 240293498 is farther back again, and 240160364 appears furthest. Pass around the QR code attendance sheet Traditionally, the attendance sheet is circulated for attendees to sign, but this method is unreliable because no one monitors the signing process, allowing one attendee to sign for someone who is absent. It is also slow and not scalable for large groups. The QR Code attendance sheet functions as a chain. The presenter randomly distributes a short‑lived, one‑time QR code—representing a virtual attendance sheet—to any number of attendees, just like handing out multiple physical sheets. Each attendee must find another participant to scan their code to record attendance, continuing the chain until the final group of attendees. The presenter then verifies the last group’s presence. The first chain is a dead chain because that student left the venue and cannot find another student to scan his QR code. The second chain contains 20 student attendance records. It also provides useful insights into their friendship and seating patterns. Architecture This project is built using Vibe Coding, so we will not share highly technical details in this post. If you'd like to learn more, leave a comment, and we will write another blog to cover the specifics. GitHub Repo https://github.com/wongcyrus/ProvePresent Conclusion ProvePresent demonstrates how Azure serverless technology and Azure OpenAI can work together to solve a long‑standing problem in education: verifying genuine student presence and engagement. By combining real‑time QR code verification, SignalR‑powered live interactions, AI‑generated quizzes, and intelligent photo‑based seating analysis, we created a system where “being present” is no longer just a checkbox—it becomes a verifiable, interactive, and meaningful part of the learning experience. Instead of relying on outdated smart‑card systems or manual roll calls, educators gain a dynamic tool that keeps students attentive, provides insight into classroom behavior, and produces useful analytics for improving teaching outcomes. Students, in turn, benefit from an engaging, modern attendance experience that aligns with how digital‑native learners expect classes to operate. This is only the beginning. With Microsoft Foundry agents and the flexibility of Azure Functions, there are many opportunities to extend ProvePresent further—richer analytics, smarter engagement models, and seamless integration with LMS platforms. If there’s interest, we’re happy to share more technical details, architectural deep dives, and future roadmap ideas in a follow‑up post. Thank you for the contribution of Microsoft Student Ambassadors Hong Kong Institute of Information Technology (HKIIT) Wong Wing Ho, CHAN Sham Jayson, Pang Ho Shum, and Chan Ka Chun. They are major in Higher Diploma in Cloud and Data Centre Administration. About the Author Cyrus Wong is the senior lecturer of Hong Kong Institute of Information Technology (HKIIT) @ IVE(Lee Wai Lee).and he focuses on teaching public Cloud technologies. He is a passionate advocate for the adoption of cloud technology across various media and events. With his extensive knowledge and expertise, he has earned prestigious recognitions such as AWS Builder Center, Microsoft MVP- Microsoft Foundry, and Google Developer Expert for Google Cloud Platform & AI.199Views0likes0CommentsIntegrating Microsoft Foundry with OpenClaw: Step by Step Model Configuration
Step 1: Deploying Models on Microsoft Foundry Let us kick things off in the Azure portal. To get our OpenClaw agent thinking like a genius, we need to deploy our models in Microsoft Foundry. For this guide, we are going to focus on deploying gpt-5.2-codex on Microsoft Foundry with OpenClaw. Navigate to your AI Hub, head over to the model catalog, choose the model you wish to use with OpenClaw and hit deploy. Once your deployment is successful, head to the endpoints section. Important: Grab your Endpoint URL and your API Keys right now and save them in a secure note. We will need these exact values to connect OpenClaw in a few minutes. Step 2: Installing and Initializing OpenClaw Next up, we need to get OpenClaw running on your machine. Open up your terminal and run the official installation script: curl -fsSL https://openclaw.ai/install.sh | bash The wizard will walk you through a few prompts. Here is exactly how to answer them to link up with our Azure setup: First Page (Model Selection): Choose "Skip for now". Second Page (Provider): Select azure-openai-responses. Model Selection: Select gpt-5.2-codex , For now only the models listed (hosted on Microsoft Foundry) in the picture below are available to be used with OpenClaw. Follow the rest of the standard prompts to finish the initial setup. Step 3: Editing the OpenClaw Configuration File Now for the fun part. We need to manually configure OpenClaw to talk to Microsoft Foundry. Open your configuration file located at ~/.openclaw/openclaw.json in your favorite text editor. Replace the contents of the models and agents sections with the following code block: { "models": { "providers": { "azure-openai-responses": { "baseUrl": "https://<YOUR_RESOURCE_NAME>.openai.azure.com/openai/v1", "apiKey": "<YOUR_AZURE_OPENAI_API_KEY>", "api": "openai-responses", "authHeader": false, "headers": { "api-key": "<YOUR_AZURE_OPENAI_API_KEY>" }, "models": [ { "id": "gpt-5.2-codex", "name": "GPT-5.2-Codex (Azure)", "reasoning": true, "input": ["text", "image"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 400000, "maxTokens": 16384, "compat": { "supportsStore": false } }, { "id": "gpt-5.2", "name": "GPT-5.2 (Azure)", "reasoning": false, "input": ["text", "image"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 272000, "maxTokens": 16384, "compat": { "supportsStore": false } } ] } } }, "agents": { "defaults": { "model": { "primary": "azure-openai-responses/gpt-5.2-codex" }, "models": { "azure-openai-responses/gpt-5.2-codex": {} }, "workspace": "/home/<USERNAME>/.openclaw/workspace", "compaction": { "mode": "safeguard" }, "maxConcurrent": 4, "subagents": { "maxConcurrent": 8 } } } } You will notice a few placeholders in that JSON. Here is exactly what you need to swap out: Placeholder Variable What It Is Where to Find It <YOUR_RESOURCE_NAME> The unique name of your Azure OpenAI resource. Found in your Azure Portal under the Azure OpenAI resource overview. <YOUR_AZURE_OPENAI_API_KEY> The secret key required to authenticate your requests. Found in Microsoft Foundry under your project endpoints or Azure Portal keys section. <USERNAME> Your local computer's user profile name. Open your terminal and type whoami to find this. Step 4: Restart the Gateway After saving the configuration file, you must restart the OpenClaw gateway for the new Foundry settings to take effect. Run this simple command: openclaw gateway restart Configuration Notes & Deep Dive If you are curious about why we configured the JSON that way, here is a quick breakdown of the technical details. Authentication Differences Azure OpenAI uses the api-key HTTP header for authentication. This is entirely different from the standard OpenAI Authorization: Bearer header. Our configuration file addresses this in two ways: Setting "authHeader": false completely disables the default Bearer header. Adding "headers": { "api-key": "<key>" } forces OpenClaw to send the API key via Azure's native header format. Important Note: Your API key must appear in both the apiKey field AND the headers.api-key field within the JSON for this to work correctly. The Base URL Azure OpenAI's v1-compatible endpoint follows this specific format: https://<your_resource_name>.openai.azure.com/openai/v1 The beautiful thing about this v1 endpoint is that it is largely compatible with the standard OpenAI API and does not require you to manually pass an api-version query parameter. Model Compatibility Settings "compat": { "supportsStore": false } disables the store parameter since Azure OpenAI does not currently support it. "reasoning": true enables the thinking mode for GPT-5.2-Codex. This supports low, medium, high, and xhigh levels. "reasoning": false is set for GPT-5.2 because it is a standard, non-reasoning model. Model Specifications & Cost Tracking If you want OpenClaw to accurately track your token usage costs, you can update the cost fields from 0 to the current Azure pricing. Here are the specs and costs for the models we just deployed: Model Specifications Model Context Window Max Output Tokens Image Input Reasoning gpt-5.2-codex 400,000 tokens 16,384 tokens Yes Yes gpt-5.2 272,000 tokens 16,384 tokens Yes No Current Cost (Adjust in JSON) Model Input (per 1M tokens) Output (per 1M tokens) Cached Input (per 1M tokens) gpt-5.2-codex $1.75 $14.00 $0.175 gpt-5.2 $2.00 $8.00 $0.50 Conclusion: And there you have it! You have successfully bridged the gap between the enterprise-grade infrastructure of Microsoft Foundry and the local autonomy of OpenClaw. By following these steps, you are not just running a chatbot; you are running a sophisticated agent capable of reasoning, coding, and executing tasks with the full power of GPT-5.2-codex behind it. The combination of Azure's reliability and OpenClaw's flexibility opens up a world of possibilities. Whether you are building an automated devops assistant, a research agent, or just exploring the bleeding edge of AI, you now have a robust foundation to build upon. Now it is time to let your agent loose on some real tasks. Go forth, experiment with different system prompts, and see what you can build. If you run into any interesting edge cases or come up with a unique configuration, let me know in the comments below. Happy coding!10KViews2likes2Comments