Deploy Machine Learning Models the Smart Way with Azure Blob & Web App

MohamedFaraazman

Brass Contributor

Jun 27, 2025

Deploying machine learning models is one thing but deploying large ML models efficiently is a different challenge altogether. As models grow in size (think Hugging Face Transformers, vision models, or even fine-tuned LLMs), bundling them directly into your app or Docker container becomes impractical and slow. In this post, you’ll learn how to solve this by hosting your model on Azure Blob Storage, loading it dynamically at runtime into your inference app, and deploying the entire application via Azure Web App. This approach keeps your deployments lightweight, allows for easy model updates, and aligns with cloud-native best practices making it ideal for both experimentation and production scenarios.

💡 Why This Approach?

Traditional deployments often include models inside the app, leading to:

Large container sizes
Long build times
Slow cold starts
Painful updates when models change

With Azure Blob Storage, you can offload the model and only fetch it at runtime — reducing size, improving flexibility, and enabling easier updates.

What You will Need

An ML model (model.pkl, model.pt, etc.)
An Azure Blob Storage account
A Python web app (FastAPI, Flask, or Streamlit)
Azure Web App (App Service for Python)
Azure Python SDK: azure-storage-blob

Step 1: Save and Upload Your Model to Blob Storage

First, save your trained model locally:

# PyTorch example
import torch
torch.save(model.state_dict(), "model.pt")

Then, upload it to Azure Blob Storage:

from azure.storage.blob import BlobServiceClient

conn_str = "your_connection_string"
blob_service = BlobServiceClient.from_connection_string(conn_str)
container = blob_service.get_container_client("models")

with open("model.pt", "rb") as f:
    container.upload_blob(name="model.pt", data=f, overwrite=True)

Step 2: Build a Lightweight Inference App

Create a simple FastAPI app that loads the model from Blob Storage on startup:

from fastapi import FastAPI
from azure.storage.blob import BlobClient
import io, torch

app = FastAPI()

@app.on_event("startup")
def load_model():
    print("Loading model from Azure Blob Storage...")
    blob = BlobClient.from_connection_string("your_connection_string", container_name="models", blob_name="model.pt")
    stream = io.BytesIO(blob.download_blob().readall())
    global model
    model = torch.load(stream, map_location='cpu')
    model.eval()

@app.get("/")
def read_root():
    return {"message": "Model loaded and ready!"}

@app.post("/predict")
def predict(data: dict):
    # Example input, dummy output
    return {"result": "prediction goes here"}

Step 3: Push Your App to GitHub and Deploy to Azure Model Loads at Runtime!

Now that your ML model is safely uploaded to Azure Blob Storage (Step 2), it’s time to push your inference app (without the model) to GitHub and deploy it via Azure Web App.

The trick? Your app will dynamically fetch the model from Blob Storage at runtime — keeping your repo light and deployment fast!

3.1 Push Your App Code (Without the Model) to GitHub

Your project structure should look like this:

azure-ml-deploy/
│
├── main.py              # Your FastAPI/Flask app
├── requirements.txt     # Python dependencies
├── README.md            # Optional documentation

🚫 Do NOT include model.pt or any large model files in your GitHub repo!

3.2 main.py: Load the Model from Azure Blob Storage at Runtime

Here's your main.py — which automatically pulls the model during startup:

from fastapi import FastAPI
from azure.storage.blob import BlobClient
import torch
import io

app = FastAPI()

@app.on_event("startup")
def load_model():
    print(" Loading model from Azure Blob Storage...")
    blob = BlobClient.from_connection_string(
        conn_str="your_connection_string",  # You’ll set this in Azure Portal
        container_name="models",
        blob_name="model.pt"
    )
    stream = io.BytesIO(blob.download_blob().readall())
    global model
    model = torch.load(stream, map_location='cpu')
    model.eval()

@app.get("/")
def home():
    return {"status": "Model loaded from Azure Blob!"}

@app.post("/predict")
def predict(data: dict):
    # Replace this with your own prediction logic
    return {"prediction": "sample output"}

3.3 requirements.txt

fastapi
uvicorn
torch
azure-storage-blob

3.4 Deploy to Azure Web App Using GitHub Repo

Go to Azure Portal
Create a new Web App
- Runtime Stack: Python 3.10
- OS: Linux
Under Deployment > GitHub, connect your GitHub repo
In Configuration > Application Settings, add:
- AZURE_STORAGE_CONN_STRING = <your-blob-connection-string>

This way, your app doesn’t store any secrets in code.

Benefits of This Setup

Clean separation of model and code
Smaller, faster deployable packages
Easy model updates (just replace the blob!)
No need for GPUs or complex infrastructure
Ideal for web APIs, dashboards, and even chatbots.

Conclusion

In this blog, you learned how to separate your ML model storage from deployment, making your applications faster, cleaner, and more scalable using Microsoft Azure technologies.

By pushing a lightweight API to GitHub and having your application download the model from Azure Blob Storage at runtime, you:

Avoid bloated GitHub repos
Accelerate deployments via Azure Web App
Keep credentials and models secure with Azure App Settings
Enable dynamic updates to your model without redeploying your app

This architecture is perfect for real-world, production-grade ML systems whether you're building prototypes or enterprise-grade APIs.

💡 Final Thought

Decouple. Deploy. Deliver.
With the power of Azure Blob Storage + Azure App Service, you can scale smarter — not heavier.

Happy Building! ✨

If you found this blog helpful or you're working on something similar, I’d love to connect and exchange ideas join the Azure AI Foundry communitues or reach out to me on Linkedin Mohamed Faraazman Bin Farooq S | LinkedIn

Updated Jun 27, 2025

Version 1.0

MohamedFaraazman

Brass Contributor

Joined June 26, 2024

View Profile

Educator Developer Blog

Follow this blog board to get notified when there's new activity