VMSS
4 TopicsStep-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.10KViews1like4CommentsAutomatic scaling with Azure Virtual Machine Scale Sets flexible orchestration mode
At Ignite March 2021, we announced the Public Preview of Azure Virtual Machine Scale Sets with flexible orchestration mode, an evolution of Azure Virtual Machine Scale sets that makes it easier to run a variety of virtual machine workloads at high scale with high availability. We are excited to announce we are adding additional functionality to the VMSS Flexible Orchestration preview: Automatic scaling Flexible orchestration mode now allows you to scale your virtual machine application out or in manually, automatically based on metrics, or according to a schedule. Like the traditional VMSS in Uniform Orchestration Mode, you specify a virtual machine profile or template for virtual machine instances: VM size, networking configuration, data disks, etc, and the number of instances you would like. Once the profile is defined, the scale set will automatically create the number of instances you request or remove instances and associated NICs and disks. VMSS provides many options to help you scale out based on your application needs: Scale up to 1000 instances in the scale set Specify instances should be placed in a particular zone Spread across multiple fault domains Automatically scale based on metrics such as aggregate CPU load, disk throughput, memory usage, etc Use Spot or on demand priority Automatically remove NICs and Disks when deleting the VM instances When application demand goes down or you need fewer instances for your application, you can save cost by scaling and reducing the number of instances in your scale set. Faster, more reliable deployments VMSS Flexible Orchestration mode is built on our next generation datacenter deployment technologies, enabling more reliable deployment success, more consistent deployment times, and faster, more reliable scale out and scale in operations. Maintain application health with Application Monitoring and Automatic Instance Repair You can install the Application Health Extension on each instance to allow your application to report application specific health metrics to Azure. Azure can automatically remove and replace instances with unhealthy application state. Safely remove instances with Terminate Notification Your application can receive an instance termination notice and set a predefined delay to the terminate operation, allowing your application to perform any clean up activities or end of life workflow before the instance is deleted. Application aware In Guest Security Patching Orchestration Automatic VM guest patching for virtual machines helps ease update management by safely and automatically patching virtual machines to maintain security compliance. With automatic VM guest patching enabled, the VM is assessed periodically to determine the applicable patches for that VM. Updates classified as 'Critical' or 'Security' are automatically downloaded and applied on the VM during off-peak hours. Patch orchestration is managed by Azure and patches are applied following availability-first principles. Improve network security with explicit outbound connectivity Historically, Azure VMs are assigned a default outbound IP address, enabling outbound connectivity from the VM to the internet (default outbound access). There are several disadvantages of this default outbound access IP including inability to lock down access via network security groups, and SNAT port exhaustion. In order to support modern best practices based on the secure by default approach in zero trust network security, VM instances created with VMSS Flexible Orchestration will not have the default outbound access IP associated with it. VMSS Flexible Orchestration will require that you specify an explicit outbound connectivity method, for example: Associate a NAT Gateway to the subnet where the instances reside Associate a Standard Load balancer with Outbound Rules configured Associate a Public IP with the VM Network Interface Only VMs created implicitly by the VMSS scaling engine will be secure by default with no implicit IP. VMs associated with an Availability Set or VMSS Uniform Orchestration mode, or standalone VMs that are later added to a VMSS Flex will still have the default outbound access and implicit IP address enabled. If you are building new workloads for VMSS Flexible Orchestration, or migrating existing workloads to VMSS Flexible orchestration, you may need to review network configuration to ensure connectivity to external services, including: Windows Activation Key Management Service Establish Private Link to required Azure services like Storage accounts, Azure Key Vault, etc. Custom scripts that require access to external URIs, Azure Active Directory Domain jon, etc Windows Update service For more information, refer to Default Outbound Access Support for Azure Backup and Azure Site Recovery We have extended support for VM management service like Azure Backup and Azure Site Recovery to VMSS Flexible Orchestraton mode. Example: N-Tier Application with VMSS Flexible Orchestration Let’s look at a how you can use VMSS Flexible Orchestration mode to simplify a traditional N-Tier Application virtual machine architecture. Adapted from Azure Architecture Center: N-Tier application with Apache Cassandra Traditionally this application architecture requires that you managed each of the 14 VMs individually; you are responsible for monitoring each instance, performing all security patching and ensuring application update. Furthermore, if demand for your application grows or shrinks, you would have to manually create additional instances at the web and/or business tier to handle additional traffic. You can simplify deployment and management of this architecture by using a VMSS with Flexible Orchestration at each application tier, and rely on VMSS platform features to assist with monitoring and management tasks. Data-tier – As this database workload tends to be stateful and requiring that instances are spread across multiple racks or partitions, you can specify a VMSS Flexible Orchestration to spread virtual machines across fault domains Business-tier – Middle tier of the application is often stateless, so you may be able to specify VMSS Flexible with maximum spreading (allow Azure to manage spreading…no particular quorum requirement). You could take advantage of Automatic Instance Repair to monitor if application instances are reporting healthy, and automatically replacing unhealthy instances with new, healthy instances. Web tier – This also tends to be a stateless tier, and is most susceptible to dynamic changes in traffic. You can specify autoscaling rules to automatically increase or decrease the number of instances based on a schedule, or metrics based rules. You can help optimize costs by mixing demand types; adding 2-3 instance at full, on-demand pricing, and specifying auto scale rules to scale out with less-expensive Spot instances. Sample templates: vm-scale-sets/vmss-flex-n-tier-demo at master · Azure/vm-scale-sets (github.com) Looking toward General Availability and beyond We are excited to share this first step in our journey to combine Azure Virtual Machines, Availability Sets, and VMSS into a single, integrated offering in VMSS Flexible Orchestration. On the way to general availability, we expect to continue to improve the parity between VMSS Uniform and VMSS Flexible Orchestration. One feature we plan to add next is the ability to specify multi-zone deployments, so you can automatically spread instances across multiple availability zones. We also look forward to bring more API parity between VMSS Uniform and Flex for batch instance operations, support for all VM sizes, as well as VMSS orchestrations like Scale in Policy, and Instance Protection. We look forward to hearing your feedback and stories, so we can continue to help you build the applications and services for your organization. Resources to get you started Virtual Machine Scale Sets Learn how to deploy and manage VMSS Flex8.8KViews2likes0CommentsA little PS script for getting VM and VMSS member status
I wrote this simple PS script to see the status of our VMs and VMSS members, as I wished to monitor infrastructure with an eye to costs. The VMSS bit took me a while, probably because I'm not very bright, but I thought the construction might help others who need to do tasks on each member of a scale set. YMMV # Login-AzureRMAccount # Uncomment above to log in # add a Object to collect output $outObj = New-Object PSObject # What is your TenantID $tenantId = "your tenant ID in here" # First get a list of subscriptions in this tenant $subList = Get-AzureRmSubscription -TenantId $tenantId # Then write it to the screen in yellow Write-Host -ForegroundColor Yellow $subList.name foreach ($subName in $subList.name) { # Write each sub as we work on it below the yellow name in Green Write-Host -NoNewline -ForegroundColor Green $subName "" # Below is an odd construction # I find that if you use Select-AzureRmSubscription with a string subName # it does not always work, mostly but not always # the method below always has worked for me YMMV $subObj = Get-AzureRmSubscription -SubscriptionName $subName $subTmp = Select-AzureRmSubscription -SubscriptionId $subObj # Get list of Resource Groups $rgList = Get-AzureRmResourceGroup foreach ($rgName in $rgList.ResourceGroupName) # In each RG first get the VMSSs then the VMs { # The below logic will be useful for any thing you need to do on each member of a scale set $vmssList = Get-AzureRmVmss -ResourceGroupName $rgName foreach ($vmssName in $vmssList.Name) { $vmssObj = Get-AzureRmVMssVM -InstanceView -Name $vmssName -ResourceGroupName $rgName foreach ($instCount in $vmssObj.InstanceID) # if you want to use this logic for another task on each member of a scale set # then replace what is between {} with your tasks {$vmssTmp = Get-AzureRmVmssVM -InstanceView -InstanceId $instCount -ResourceGroupName $rgName -VMScaleSetName $vmssName $outTmp = $vmssName + "_" + $instCount + "." + $rgName + "." + $subName $outObj | Add-Member $outTmp $vmssTmp.Statuses[1].DisplayStatus } } $vmList = Get-AzureRmVM -ResourceGroupName $rgName foreach ($vmName in $vmList.Name) { $vmObj = Get-AzureRmVM -Name $vmName -ResourceGroupName $rgName -Status $outTmp = $vmObj.Name + "." + $rgName + "." + $subName $outObj | Add-Member $outTmp $vmObj.Statuses[1].DisplayStatus } } } Write-Host # Write-Host for the New line $outObj # Output the list1.3KViews0likes0Comments