azureml
6 TopicsHow to Set Up Claude Code with Microsoft Foundry Models on macOS
Introduction Building with AI isn't just about picking a smart model. It is about where that model lives. I chose to route my Claude Code setup through Microsoft Foundry because I needed more than just a raw API. I wanted the reliability, compliance, and structured management that comes with Microsoft's ecosystem. When you are moving from a prototype to something real, having that level of infrastructure backing your calls makes a significant difference. The challenge is that Foundry is designed for enterprise cloud environments, while my daily development work happens locally on a MacBook. Getting the two to communicate seamlessly involved navigating a maze of shell configurations and environment variables that weren't immediately obvious. I wrote this guide to document the exact steps for bridging that gap. Here is how you can set up Claude Code to run locally on macOS while leveraging the stability of models deployed on Microsoft Foundry. Requirements Before we open the terminal, let's make sure you have the necessary accounts and environments ready. Since we are bridging a local CLI with an enterprise cloud setup, having these credentials handy now will save you time later. Azure Subscription with Microsoft Foundry Setup - This is the most critical piece. You need an active Azure subscription where the Microsoft Foundry environment is initialized. Ensure that you have deployed the Claude model you intend to use and that the deployment status is active. You will need the specific endpoint URL and the associated API keys from this deployment to configure the connection. An Anthropic User Account - Even though the compute is happening on Azure, the interface requires an Anthropic account. You will need this to authenticate your session and manage your user profile settings within the Claude Code ecosystem. Claude Code Client on macOS - We will be running the commands locally, so you need the Claude Code CLI installed on your MacBook. Step 1: Install Claude Code on macOS The recommended installation method is via Homebrew or Curl, which sets it up for terminal access ("OS level"). Option A: Homebrew (Recommended) brew install --cask claude-code Option B: Curl curl -fsSL https://claude.ai/install.sh | bash Verify Installation: Run claude --version. Step 2: Set Up Microsoft Foundry to deploy Claude model Navigate to your Microsoft Foundry portal, and find the Claude model catalog, and deploy the selected Claude model. [Microsoft Foundry > My Assets > Models + endpoints > + Deploy Model > Deploy Base model > Search for "Claude"] In your Model Deployment dashboard, go to the deployed Claude Models and get the "Endpoints and keys". Store it somewhere safe, because we will need them to configure Claude Code later on. Configure Environment Variables in MacOS terminal: Now we need to tell your local Claude Code client to route requests through Microsoft Foundry instead of the default Anthropic endpoints. This is handled by setting specific environment variables that act as a bridge between your local machine and your Azure resources. You could run these commands manually every time you open a terminal, but it is much more efficient to save them permanently in your shell profile. For most modern macOS users, this file is .zshrc. Open your terminal and add the following lines to your profile, making sure to replace the placeholder text with your actual Azure credentials: export CLAUDE_CODE_USE_FOUNDRY=1 export ANTHROPIC_FOUNDRY_API_KEY="your-azure-api-key" export ANTHROPIC_FOUNDRY_RESOURCE="your-resource-name" # Specify the deployment name for Opus export CLAUDE_CODE_MODEL="your-opus-deployment-name" Once you have added these variables, you need to reload your shell configuration for the changes to take effect. Run the source command below to update your current session, and then verify the setup by launching Claude: source ~/.zshrc claude If everything is configured correctly, the Claude CLI will initialize using your Microsoft Foundry deployment as the backend. Once you execute the claude command, the CLI will prompt you to choose an authentication method. Select Option 2 (Antrophic Console account) to proceed. This action triggers your default web browser and redirects you to the Claude Console. Simply sign in using your standard Anthropic account credentials. After you have successfully signed in, you will be presented with a permissions screen. Click the Authorize button to link your web session back to your local terminal. Return to your terminal window, and you should see a notification confirming that the login process is complete. Press Enter to finalize the setup. You are now fully connected. You can start using Claude Code locally, powered entirely by the model deployment running in your Microsoft Foundry environment. Conclusion Setting up this environment might seem like a heavy lift just to run a CLI tool, but the payoff is significant. You now have a workflow that combines the immediate feedback of local development with the security and infrastructure benefits of Microsoft Foundry. One of the most practical upgrades is the removal of standard usage caps. You are no longer limited to the 5-hour API call limits, which gives you the freedom to iterate, test, and debug for as long as your project requires without hitting a wall. By bridging your local macOS terminal to Azure, you are no longer just hitting an API endpoint. You are leveraging a managed, compliance-ready environment that scales with your needs. The best part is that now the configuration is locked in, you don't need to think about the plumbing again. You can focus entirely on coding, knowing that the reliability of an enterprise platform is running quietly in the background supporting every command.450Views1like0CommentsBuilding with Azure OpenAI Sora: A Complete Guide to AI Video Generation
In this comprehensive guide, we'll explore how to integrate both Sora 1 and Sora 2 models from Azure OpenAI Service into a production web application. We'll cover API integration, request body parameters, cost analysis, limitations, and the key differences between using Azure AI Foundry endpoints versus OpenAI's native API. Table of Contents Introduction to Sora Models Azure AI Foundry vs. OpenAI API Structure API Integration: Request Body Parameters Video Generation Modes Cost Analysis per Generation Technical Limitations & Constraints Resolution & Duration Support Implementation Best Practices Introduction to Sora Models Sora is OpenAI's groundbreaking text-to-video model that generates realistic videos from natural language descriptions. Azure AI Foundry provides access to two versions: Sora 1: The original model focused primarily on text-to-video generation with extensive resolution options (480p to 1080p) and flexible duration (1-20 seconds) Sora 2: The enhanced version with native audio generation, multiple generation modes (text-to-video, image-to-video, video-to-video remix), but more constrained resolution options (720p only in public preview) Azure AI Foundry vs. OpenAI API Structure Key Architectural Differences Sora 1 uses Azure's traditional deployment-based API structure: Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/... Parameters: Uses Azure-specific naming like n_seconds, n_variants, separate width/height fields Job Management: Uses /jobs/{id} for status polling Content Download: Uses /video/generations/{generation_id}/content/video Sora 2 adapts OpenAI's v1 API format while still being hosted on Azure: Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/videos Parameters: Uses OpenAI-style naming like seconds (string), size (combined dimension string like "1280x720") Job Management: Uses /videos/{video_id} for status polling Content Download: Uses /videos/{video_id}/content Why This Matters? This architectural difference requires conditional request formatting in your code: const isSora2 = deployment.toLowerCase().includes('sora-2'); if (isSora2) { requestBody = { model: deployment, prompt, size: `${width}x${height}`, // Combined format seconds: duration.toString(), // String type }; } else { requestBody = { model: deployment, prompt, height, // Separate dimensions width, n_seconds: duration.toString(), // Azure naming n_variants: variants, }; } API Integration: Request Body Parameters Sora 1 API Parameters Standard Text-to-Video Request: { "model": "sora-1", "prompt": "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.", "height": "720", "width": "1280", "n_seconds": "12", "n_variants": "2" } Parameter Details: model (String, Required): Your Azure deployment name prompt (String, Required): Natural language description of the video (max 32000 chars) height (String, Required): Video height in pixels width (String, Required): Video width in pixels n_seconds (String, Required): Duration (1-20 seconds) n_variants (String, Optional): Number of variations to generate (1-4, constrained by resolution) Sora 2 API Parameters Text-to-Video Request: { "model": "sora-2", "prompt": "A serene mountain landscape with cascading waterfalls, cinematic drone shot", "size": "1280x720", "seconds": "12" } Image-to-Video Request (uses FormData): const formData = new FormData(); formData.append('model', 'sora-2'); formData.append('prompt', 'Animate this image with gentle wind movement'); formData.append('size', '1280x720'); formData.append('seconds', '8'); formData.append('input_reference', imageFile); // JPEG/PNG/WebP Video-to-Video Remix Request: Endpoint: POST .../videos/{video_id}/remix Body: Only { "prompt": "your new description" } The original video's structure, motion, and framing are reused while applying the new prompt Parameter Details: model (String, Optional): Your deployment name prompt (String, Required): Video description size (String, Optional): Either "720x1280" or "1280x720" (defaults to "720x1280") seconds (String, Optional): "4", "8", or "12" (defaults to "4") input_reference (File, Optional): Reference image for image-to-video mode remix_video_id (String, URL parameter): ID of video to remix Video Generation Modes 1. Text-to-Video (Both Models) The foundational mode where you provide a text prompt describing the desired video. Implementation: const response = await fetch(endpoint, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ model: deployment, prompt: "A train journey through mountains with dramatic lighting", size: "1280x720", seconds: "12", }), }); Best Practices: Include shot type (wide, close-up, aerial) Describe subject, action, and environment Specify lighting conditions (golden hour, dramatic, soft) Add camera movement if desired (pans, tilts, tracking shots) 2. Image-to-Video (Sora 2 Only) Generate a video anchored to or starting from a reference image. Key Requirements: Supported formats: JPEG, PNG, WebP Image dimensions must exactly match the selected video resolution Our implementation automatically resizes uploaded images to match Implementation Detail: // Resize image to match video dimensions const targetWidth = parseInt(width); const targetHeight = parseInt(height); const resizedImage = await resizeImage(inputReference, targetWidth, targetHeight); // Send as multipart/form-data formData.append('input_reference', resizedImage); 3. Video-to-Video Remix (Sora 2 Only) Create variations of existing videos while preserving their structure and motion. Use Cases: Change weather conditions in the same scene Modify time of day while keeping camera movement Swap subjects while maintaining composition Adjust artistic style or color grading Endpoint Structure: POST {base_url}/videos/{original_video_id}/remix?api-version=2024-08-01-preview Implementation: let requestEndpoint = endpoint; if (isSora2 && remixVideoId) { const [baseUrl, queryParams] = endpoint.split('?'); const root = baseUrl.replace(/\/videos$/, ''); requestEndpoint = `${root}/videos/${remixVideoId}/remix${queryParams ? '?' + queryParams : ''}`; } Cost Analysis per Generation Sora 1 Pricing Model Base Rate: ~$0.05 per second per variant at 720p Resolution Scaling: Cost scales linearly with pixel count Formula: const basePrice = 0.05; const basePixels = 1280 * 720; // Reference resolution const currentPixels = width * height; const resolutionMultiplier = currentPixels / basePixels; const totalCost = basePrice * duration * variants * resolutionMultiplier; Examples: 720p (1280×720), 12 seconds, 1 variant: $0.60 1080p (1920×1080), 12 seconds, 1 variant: $1.35 720p, 12 seconds, 2 variants: $1.20 Sora 2 Pricing Model Flat Rate: $0.10 per second per variant (no resolution scaling in public preview) Formula: const totalCost = 0.10 * duration * variants; Examples: 720p (1280×720), 4 seconds: $0.40 720p (1280×720), 12 seconds: $1.20 720p (720×1280), 8 seconds: $0.80 Note: Since Sora 2 currently only supports 720p in public preview, resolution doesn't affect cost, only duration matters. Cost Comparison Scenario Sora 1 (720p) Sora 2 (720p) Winner 4s video $0.20 $0.40 Sora 1 12s video $0.60 $1.20 Sora 1 12s + audio N/A (no audio) $1.20 Sora 2 (unique) Image-to-video N/A $0.40-$1.20 Sora 2 (unique) Recommendation: Use Sora 1 for cost-effective silent videos at various resolutions. Use Sora 2 when you need audio, image/video inputs, or remix capabilities. Technical Limitations & Constraints Sora 1 Limitations Resolution Options: 9 supported resolutions from 480×480 to 1920×1080 Includes square, portrait, and landscape formats Full list: 480×480, 480×854, 854×480, 720×720, 720×1280, 1280×720, 1080×1080, 1080×1920, 1920×1080 Duration: Flexible: 1 to 20 seconds Any integer value within range Variants: Depends on resolution: 1080p: Variants disabled (n_variants must be 1) 720p: Max 2 variants Other resolutions: Max 4 variants Concurrent Jobs: Maximum 2 jobs running simultaneously Job Expiration: Videos expire 24 hours after generation Audio: No audio generation (silent videos only) Sora 2 Limitations Resolution Options (Public Preview): Only 2 options: 720×1280 (portrait) or 1280×720 (landscape) No square formats No 1080p support in current preview Duration: Fixed options only: 4, 8, or 12 seconds No custom durations Defaults to 4 seconds if not specified Variants: Not prominently supported in current API documentation Focus is on single high-quality generations with audio Concurrent Jobs: Maximum 2 jobs (same as Sora 1) Job Expiration: 24 hours (same as Sora 1) Audio: Native audio generation included (dialogue, sound effects, ambience) Shared Constraints Concurrent Processing: Both models enforce a limit of 2 concurrent video jobs per Azure resource. You must wait for one job to complete before starting a third. Job Lifecycle: queued → preprocessing → processing/running → completed Download Window: Videos are available for 24 hours after completion. After expiration, you must regenerate the video. Generation Time: Typical: 1-5 minutes depending on resolution, duration, and API load Can occasionally take longer during high demand Resolution & Duration Support Matrix Sora 1 Support Matrix Resolution Aspect Ratio Max Variants Duration Range Use Case 480×480 Square 4 1-20s Social thumbnails 480×854 Portrait 4 1-20s Mobile stories 854×480 Landscape 4 1-20s Quick previews 720×720 Square 4 1-20s Instagram posts 720×1280 Portrait 2 1-20s TikTok/Reels 1280×720 Landscape 2 1-20s YouTube shorts 1080×1080 Square 1 1-20s Premium social 1080×1920 Portrait 1 1-20s Premium vertical 1920×1080 Landscape 1 1-20s Full HD content Sora 2 Support Matrix Resolution Aspect Ratio Duration Options Audio Generation Modes 720×1280 Portrait 4s, 8s, 12s ✅ Yes Text, Image, Video Remix 1280×720 Landscape 4s, 8s, 12s ✅ Yes Text, Image, Video Remix Note: Sora 2's limited resolution options in public preview are expected to expand in future releases. Implementation Best Practices 1. Job Status Polling Strategy Implement adaptive backoff to avoid overwhelming the API: const maxAttempts = 180; // 15 minutes max let attempts = 0; const baseDelayMs = 3000; // Start with 3 seconds while (attempts < maxAttempts) { const response = await fetch(statusUrl, { headers: { 'api-key': apiKey }, }); if (response.status === 404) { // Job not ready yet, wait longer const delayMs = Math.min(15000, baseDelayMs + attempts * 1000); await new Promise(r => setTimeout(r, delayMs)); attempts++; continue; } const job = await response.json(); // Check completion (different status values for Sora 1 vs 2) const isCompleted = isSora2 ? job.status === 'completed' : job.status === 'succeeded'; if (isCompleted) break; // Adaptive backoff const delayMs = Math.min(15000, baseDelayMs + attempts * 1000); await new Promise(r => setTimeout(r, delayMs)); attempts++; } 2. Handling Different Response Structures Sora 1 Video Download: const generations = Array.isArray(job.generations) ? job.generations : []; const genId = generations[0]?.id; const videoUrl = `${root}/${genId}/content/video`; Sora 2 Video Download: const videoUrl = `${root}/videos/${jobId}/content`; 3. Error Handling try { const response = await fetch(endpoint, fetchOptions); if (!response.ok) { const error = await response.text(); throw new Error(`Video generation failed: ${error}`); } // ... handle successful response } catch (error) { console.error('[VideoGen] Error:', error); // Implement retry logic or user notification } 4. Image Preprocessing for Image-to-Video Always resize images to match the target video resolution: async function resizeImage(file: File, targetWidth: number, targetHeight: number): Promise<File> { return new Promise((resolve, reject) => { const img = new Image(); const canvas = document.createElement('canvas'); const ctx = canvas.getContext('2d'); img.onload = () => { canvas.width = targetWidth; canvas.height = targetHeight; ctx.drawImage(img, 0, 0, targetWidth, targetHeight); canvas.toBlob((blob) => { if (blob) { const resizedFile = new File([blob], file.name, { type: file.type }); resolve(resizedFile); } else { reject(new Error('Failed to create resized image blob')); } }, file.type); }; img.onerror = () => reject(new Error('Failed to load image')); img.src = URL.createObjectURL(file); }); } 5. Cost Tracking Implement cost estimation before generation and tracking after: // Pre-generation estimate const estimatedCost = calculateCost(width, height, duration, variants, soraVersion); // Save generation record await saveGenerationRecord({ prompt, soraModel: soraVersion, duration: parseInt(duration), resolution: `${width}x${height}`, variants: parseInt(variants), generationMode: mode, estimatedCost, status: 'queued', jobId: job.id, }); // Update after completion await updateGenerationStatus(jobId, 'completed', { videoId: finalVideoId }); 6. Progressive User Feedback Provide detailed status updates during the generation process: const statusMessages: Record<string, string> = { 'preprocessing': 'Preprocessing your request...', 'running': 'Generating video...', 'processing': 'Processing video...', 'queued': 'Job queued...', 'in_progress': 'Generating video...', }; onProgress?.(statusMessages[job.status] || `Status: ${job.status}`); Conclusion Building with Azure OpenAI's Sora models requires understanding the nuanced differences between Sora 1 and Sora 2, both in API structure and capabilities. Key takeaways: Choose the right model: Sora 1 for resolution flexibility and cost-effectiveness; Sora 2 for audio, image inputs, and remix capabilities Handle API differences: Implement conditional logic for parameter formatting and status polling based on model version Respect limitations: Plan around concurrent job limits, resolution constraints, and 24-hour expiration windows Optimize costs: Calculate estimates upfront and track actual usage for better budget management Provide great UX: Implement adaptive polling, progressive status updates, and clear error messages The future of AI video generation is exciting, and Azure AI Foundry provides production-ready access to these powerful models. As Sora 2 matures and limitations are lifted (especially resolution options), we'll see even more creative applications emerge. Resources: Azure AI Foundry Sora Documentation OpenAI Sora API Reference Azure OpenAI Service Pricing This blog post is based on real-world implementation experience building LemonGrab, my AI video generation platform that integrates both Sora 1 and Sora 2 through Azure AI Foundry. The code examples are extracted from production usage.376Views0likes0CommentsDP-100 certificate
Hey community! I'm currently preparing the DP-100 certification with a couple of coworkers, two of them already tried a first time and couldn't approve it although they completed the learning path and consistently scored 90+ on the test exam. What they have told me is that the real exam has questions that are a lot harder, and especially, questions that are not really answerable with the materials from the learning path, they say that they saw many deep questions about DevOps and other questions of resources of Azure that are not really a part of Azure Machine Learning or Foundry. I wanted to ask if anyone had a similar experience with this, I thought the exam was centered on the use of azure machine learning (and AI foundry). Can the exam contain questions that are not related with azure machine learning or foundry? Would really appreciate help here! Thanks everyone!163Views1like1CommentDeploy Open Web UI on Azure VM via Docker: A Step-by-Step Guide with Custom Domain Setup.
Introductions Open Web UI (often referred to as "Ollama Web UI" in the context of LLM frameworks like Ollama) is an open-source, self-hostable interface designed to simplify interactions with large language models (LLMs) such as GPT-4, Llama 3, Mistral, and others. It provides a user-friendly, browser-based environment for deploying, managing, and experimenting with AI models, making advanced language model capabilities accessible to developers, researchers, and enthusiasts without requiring deep technical expertise. This article will delve into the step-by-step configurations on hosting OpenWeb UI on Azure. Requirements: Azure Portal Account - For students you can claim $USD100 Azure Cloud credits from this URL. Azure Virtual Machine - with a Linux of any distributions installed. Domain Name and Domain Host Caddy Open WebUI Image Step One: Deploy a Linux – Ubuntu VM from Azure Portal Search and Click on “Virtual Machine” on the Azure portal search bar and create a new VM by clicking on the “+ Create” button > “Azure Virtual Machine”. Fill out the form and select any Linux Distribution image – In this demo, we will deploy Open WebUI on Ubuntu Pro 24.04. Click “Review + Create” > “Create” to create the Virtual Machine. Tips: If you plan to locally download and host open source AI models via Open on your VM, you could save time by increasing the size of the OS disk / attach a large disk to the VM. You may also need a higher performance VM specification since large resources are needed to run the Large Language Model (LLM) locally. Once the VM has been successfully created, click on the “Go to resource” button. You will be redirected to the VM’s overview page. Jot down the public IP Address and access the VM using the ssh credentials you have setup just now. Step Two: Deploy the Open WebUI on the VM via Docker Once you are logged into the VM via SSH, run the Docker Command below: docker run -d --name open-webui --network=host --add-host=host.docker.internal:host-gateway -e PORT=8080 -v open-webui:/app/backend/data --restart always ghcr.io/open-webui/open-webui:dev This Docker command will download the Open WebUI Image into the VM and will listen for Open Web UI traffic on port 8080. Wait for a few minutes and the Web UI should be up and running. If you had setup an inbound Network Security Group on Azure to allow port 8080 on your VM from the public Internet, you can access them by typing into the browser: [PUBLIC_IP_ADDRESS]:8080 Step Three: Setup custom domain using Caddy Now, we can setup a reverse proxy to map a custom domain to [PUBLIC_IP_ADDRESS]:8080 using Caddy. The reason why Caddy is useful here is because they provide automated HTTPS solutions – you don’t have to worry about expiring SSL certificate anymore, and it’s free! You must download all Caddy’s dependencies and set up the requirements to install it using this command: sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list sudo apt update && sudo apt install caddy Once Caddy is installed, edit Caddy’s configuration file at: /etc/caddy/Caddyfile , delete everything else in the file and add the following lines: yourdomainname.com { reverse_proxy localhost:8080 } Restart Caddy using this command: sudo systemctl restart caddy Next, create an A record on your DNS Host and point them to the public IP of the server. Step Four: Update the Network Security Group (NSG) To allow public access into the VM via HTTPS, you need to ensure the NSG/Firewall of the VM allow for port 80 and 443. Let’s add these rules into Azure by heading to the VM resources page you created for Open WebUI. Under the “Networking” Section > “Network Settings” > “+ Create port rule” > “Inbound port rule” On the “Destination port ranges” field, type in 443 and Click “Add”. Repeat these steps with port 80. Additionally, to enhance security, you should avoid external users from directly interacting with Open Web UI’s port - port 8080. You should add an inbound deny rule to that port. With that, you should be able to access the Open Web UI from the domain name you setup earlier. Conclusion And just like that, you’ve turned a blank Azure VM into a sleek, secure home for your Open Web UI, no magic required! By combining Docker’s simplicity with Caddy’s “set it and forget it” HTTPS magic, you’ve not only made your app accessible via a custom domain but also locked down security by closing off risky ports and keeping traffic encrypted. Azure’s cloud muscle handles the heavy lifting, while you get to enjoy the perks of a pro setup without the headache. If you are interested in using AI models deployed on Azure AI Foundry on OpenWeb UI via API, kindly read my other article: Step-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy4.1KViews2likes1CommentConfigure Embedding Models on Azure AI Foundry with Open Web UI
Introduction Let’s take a closer look at an exciting development in the AI space. Embedding models are the key to transforming complex data into usable insights, driving innovations like smarter chatbots and tailored recommendations. With Azure AI Foundry, Microsoft’s powerful platform, you’ve got the tools to build and scale these models effortlessly. Add in Open Web UI, a intuitive interface for engaging with AI systems, and you’ve got a winning combo that’s hard to beat. In this article, we’ll explore how embedding models on Azure AI Foundry, paired with Open Web UI, are paving the way for accessible and impactful AI solutions for developers and businesses. Let’s dive in! To proceed with configuring the embedding model from Azure AI Foundry on Open Web UI, please firstly configure the requirements below. Requirements: Setup Azure AI Foundry Hub/Projects Deploy Open Web UI – refer to my previous article on how you can deploy Open Web UI on Azure VM. Optional: Deploy LiteLLM with Azure AI Foundry models to work on Open Web UI - refer to my previous article on how you can do this as well. Deploying Embedding Models on Azure AI Foundry Navigate to the Azure AI Foundry site and deploy an embedding model from the “Model + Endpoint” section. For the purpose of this demonstration, we will deploy the “text-embedding-3-large” model by OpenAI. You should be receiving a URL endpoint and API Key to the embedding model deployed just now. Take note of that credential because we will be using it in Open Web UI. Configuring the embedding models on Open Web UI Now head to the Open Web UI Admin Setting Page > Documents and Select Azure Open AI as the Embedding Model Engine. Copy and Paste the Base URL, API Key, the Embedding Model deployed on Azure AI Foundry and the API version (not the model version) into the fields below: Click “Save” to reflect the changes. Expected Output Now let us look into the scenario for when the embedding model configured on Open Web UI and when it is not. Without Embedding Models configured. With Azure Open AI Embedding models configured. Conclusion And there you have it! Embedding models on Azure AI Foundry, combined with the seamless interaction offered by Open Web UI, are truly revolutionizing how we approach AI solutions. This powerful duo not only simplifies the process of building and deploying intelligent systems but also makes cutting-edge technology more accessible to developers and businesses of all sizes. As we move forward, it’s clear that such integrations will continue to drive innovation, breaking down barriers and unlocking new possibilities in the AI landscape. So, whether you’re a seasoned developer or just stepping into this exciting field, now’s the time to explore what Azure AI Foundry and Open Web UI can do for you. Let’s keep pushing the boundaries of what’s possible!1.8KViews0likes0CommentsStep-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.11KViews1like4Comments