Introduction
Machine learning models are only as valuable as the infrastructure that supports them. A model trained in a Jupyter notebook and saved to a shared folder creates a chain of problems: no versioning, no reproducibility, no clear ownership, and no automated path to production. When the data scientist who trained it goes on vacation, nobody knows how to retrain it or where the latest version lives.
A well-designed MLOps pipeline solves all of this. It makes training repeatable, artifacts versioned, and deployment automated — so that the path from code change to live endpoint is a single merge to main.
This post provides a generic, end-to-end pattern covering the full lifecycle:
- Train a scikit-learn model against data in Azure Blob Storage
- Serialize the model as a self-contained pickle bundle
- Register it in an Azure ML Registry for cross-team discovery
- Deploy it to an Azure ML Managed Online Endpoint for real-time scoring
You can adapt this template for any scikit-learn model — classification, regression, clustering, or anomaly detection — by swapping in your own training and scoring scripts.
When to Use This Pattern
This pipeline template is a good fit when:
- Your training data lives in Azure Blob Storage (Parquet, CSV, or similar)
- You use scikit-learn (or any Python ML framework) for model training
- You need versioned model artifacts in a central registry
- You want an automated deployment path to a live scoring endpoint
- Downstream consumers (scoring pipelines, APIs, dashboards) need a reliable handoff mechanism
- You want to eliminate ad-hoc notebook-based training with no versioning or reproducibility
It is not the right fit if you need distributed training (use Azure ML pipelines instead), or if your model requires GPU inference (managed endpoints support GPU, but the config differs from what's shown here).
Architecture Overview
The pipeline follows a four-stage flow:
DevOps Gate → Train & Publish Artifact → Register in ML Registry → Deploy to Managed Endpoint
- DevOps Stage — A required gate that logs the build number and validates the pipeline is running.
- Train Stage — Installs Python dependencies, runs the training script against data in Azure Blob Storage, and publishes the pickle bundle as a pipeline artifact.
- Register Stage — Downloads the artifact and registers it in an Azure ML Registry with automatic versioning.
- Deploy Stage — Creates (or updates) a Managed Online Endpoint and deploys the newly registered model version to it for real-time scoring.
The first three stages run on every push to main. The Deploy stage can be gated with a manual approval if you want human review before going live.
The Training Script
The training script is the core of this pipeline — everything else is orchestration around it. It's a standalone Python CLI that you should be able to run locally before it ever touches a pipeline.
The general shape is:
- Load data from Azure Blob Storage (Parquet, CSV, etc.) using libraries like adlfs and pyarrow.
- Validate the schema — check that expected columns exist, types are correct, and there are enough rows to train on. Fail fast with a clear error message if not.
- Engineer features — compute derived columns, handle missing values, encode categorical. This is where most of the domain-specific logic lives.
- Train the model using scikit-learn (or your framework of choice).
- Apply preprocessing (e.g., StandardScaler) and save the preprocessor alongside the model so that scoring uses the exact same transformations.
- Serialize a bundle containing the model, preprocessor, feature column order, and training metadata into a single pickle file.
The script reads storage credentials from environment variables, keeping secrets out of the codebase entirely. It accepts an --output-path argument and writes the serialized bundle to that location — which the pipeline later publishes as an artifact.
What Goes in the Bundle
The pickle file isn't just the model — it's a self-contained scoring contract. Here's what's inside and why:
| Key | Type | Purpose |
|---|---|---|
| model | scikit-learn estimator | The trained model (e.g., IsolationForest, RandomForestClassifier) |
| scaler | StandardScaler (or similar) | The exact preprocessor fitted on training data — scoring must use the same transform |
| feature_order | list[str] | Column names in the exact order the model expects — prevents silent column reordering bugs |
| metadata.trained_at | ISO timestamp | When the model was trained — useful for debugging stale predictions |
| metadata.source_rows | int | How many rows were in the raw data — helps detect data pipeline issues |
| metadata.clean_rows | int | How many rows survived cleaning — a sudden drop signals a data quality problem |
| metadata.scikit_learn_version | str | The scikit-learn version used — pickle compatibility can break across major versions |
This structure means any consumer can load the bundle, inspect what's in it, and score new data without knowing anything about how the model was trained.
Choosing a Serialization Format
This template uses pickle, but you should choose based on your needs:
| Format | Best For | Trade-off |
|---|---|---|
| pickle | Bundles with metadata (model + scaler + feature order + config) | Built-in, no extra deps. Not safe to load from untrusted sources. |
| joblib | Large NumPy array-heavy models | Faster for large arrays, but adds a dependency. |
| ONNX | Cross-framework interop (PyTorch ↔ scikit-learn) | Portable, but not all model types are supported. |
Pickle works well when your artifact is a self-contained bundle — model, preprocessor, feature column order, and training metadata in one file. Any consumer who loads it gets everything needed to score new data correctly.
Security note: Never load pickle files from untrusted sources — deserialization can execute arbitrary code. This is safe when the pickle is produced by your own pipeline and stored in an access-controlled registry, but always validate provenance.
The Pipeline YAML
Here's the full pipeline template. Replace <your-...> placeholders with your values:
trigger:
branches:
include:
- main
paths:
include:
- <your-model-source-path>/* # e.g., src/models/anomaly-detection/*
stages:
- stage: DevOps
displayName: Required DevOps Stage
jobs:
- job: Echo
steps:
- script: echo build initiated - $(Build.BuildNumber)
- stage: Train
dependsOn: DevOps
displayName: 'Train Model & Publish Artifact'
jobs:
- job: TrainModel
steps:
- checkout: self
- task: UsePythonVersion@0
inputs:
versionSpec: '3.12' # Use a supported Python version
- script: |
python -m pip install --upgrade pip
pip install -r requirements.txt
displayName: 'Install Python dependencies'
- script: |
python <your-training-script>.py \
--output-path "$(Build.ArtifactStagingDirectory)/model_bundle.pkl"
displayName: 'Train model'
env:
AZURE_STORAGE_ACCOUNT_NAME: $(AZURE_STORAGE_ACCOUNT_NAME)
AZURE_STORAGE_ACCOUNT_KEY: $(AZURE_STORAGE_ACCOUNT_KEY) # See note on Managed Identity below
- task: PublishPipelineArtifact@1 # Use the modern task
inputs:
artifactName: 'model-pkl'
targetPath: '$(Build.ArtifactStagingDirectory)/model_bundle.pkl'
- stage: Register
dependsOn: Train
displayName: 'Register Model in ML Registry'
jobs:
- job: RegisterModel
steps:
- task: DownloadPipelineArtifact@2 # Use the modern task
inputs:
artifactName: 'model-pkl'
targetPath: '$(System.ArtifactsDirectory)/model-pkl'
- task: AzureCLI@2
displayName: 'Register model in ML Registry'
inputs:
azureSubscription: '<your-service-connection>'
scriptType: 'ps'
scriptLocation: 'inlineScript'
inlineScript: |
az extension add -n ml --yes
az ml model create `
--name <your-model-name> `
--path "$(System.ArtifactsDirectory)/model-pkl/model_bundle.pkl" `
--type custom_model `
--registry-name <your-ml-registry> `
--resource-group <your-resource-group>
Placeholder Reference
| Placeholder | Description | Example |
|---|---|---|
| <your-model-source-path> | Path to your model code in the repo | src/models/anomaly-detection |
| <your-training-script> | Your Python training script | train_model.py |
| <your-service-connection> | Azure DevOps service connection name | prod-ml-connection |
| <your-model-name> | Name for the model in the registry | sales-anomaly-detector |
| <your-ml-registry> | Azure ML Registry name | contoso-ml-registry |
| <your-resource-group> | Resource group containing the registry | rg-ml-prod |
Key Design Decisions
Credentials as environment variables — Storage credentials are stored in an Azure DevOps variable group and injected via the env: block. They never appear on the command line or in logs.
Prefer Managed Identity over keys. The template above shows AZURE_STORAGE_ACCOUNT_KEY for simplicity, but the recommended approach is to authenticate using a User Managed Identity (UMI) with the Storage Blob Data Reader role. This eliminates key rotation and reduces the credential surface. If your agent supports Managed Identity (e.g., self-hosted on an Azure VM), use DefaultAzureCredential in your training script instead of account keys.
Separate Train and Register stages — The training artifact is published as a pipeline artifact between stages. This means if registration fails, you don't have to retrain. It also gives you a downloadable artifact in Azure DevOps for debugging.
az ml model create with --registry-name — This registers the model in an Azure ML Registry (not a workspace). Registries are shared across workspaces and teams, making the model accessible to anyone with the right permissions.
Auto-versioning — Each az ml model create call with the same --name automatically increments the version number in the registry. No manual version management needed.
Permissions
The pipeline authenticates using a User Managed Identity (UMI) linked to an Azure DevOps service connection via workload identity federation. The UMI needs:
| Role | Scope | Purpose |
|---|---|---|
| Storage Blob Data Reader | Storage account or container | Read training data |
| AzureML Registry User | ML Registry | Register model artifacts |
| AzureML Data Scientist | ML Workspace | Create/update managed endpoints and deployments |
No Contributor or Owner access at the subscription or resource group level is required. Least-privilege access keeps the blast radius small.
Workload Identity Federation vs. secrets: If your Azure DevOps service connection uses workload identity federation (recommended), the UMI authenticates without any stored secrets. If using a service principal with client secret instead, store the secret in an Azure DevOps variable group marked as secret, and rotate it regularly.
Common Pitfalls
These are issues you'll likely hit when adapting this template:
Column name mismatches. Parquet files may have column names like periodid while your script expects Period ID. Add a case-insensitive column rename mapping in your training script and validate the data schema before training starts.
Windows agents use cmd.exe, not bash. If your pipeline runs on self-hosted Windows agents, backslash line continuations and bash-style commands won't work. Use single-line commands or PowerShell syntax, and use Windows-style path separators.
checkout: self vs named repositories. When your pipeline YAML lives in the same repo as your training code, always use checkout: self. A named repository checkout pulls the default branch, not the feature branch you're testing — leading to stale code running in your pipeline.
Start with the training script, not the pipeline. Get your training script working locally first. The pipeline is just orchestration — if the script doesn't work on your machine, it won't work in the pipeline either.
Pin your dependencies. Use a requirements.txt with pinned versions rather than inline pip install with unpinned packages. A scikit-learn minor version bump can change model behavior silently.
Deploying to a Managed Online Endpoint
Registering the model in the Azure ML Registry makes it discoverable. But for real-time scoring — where an API, dashboard, or another service sends data and gets predictions back — you need to deploy the model to a Managed Online Endpoint.
Azure ML Managed Online Endpoints handle the infrastructure: provisioning compute, load balancing, scaling, health probes, and rolling deployments. You provide the model and a scoring script.
HTTP Request (JSON) → Managed Online Endpoint → Deployment (blue) → score.py [init() / run()] + model.pkl → JSON Response (predictions)
Key concepts:
- An endpoint is the HTTPS URL that clients call. It has auth (key or AAD token) and a DNS name.
- A deployment sits behind the endpoint and runs your scoring code + model on provisioned compute.
- You can have multiple deployments (e.g., blue and green) behind one endpoint for A/B testing or canary rollouts, controlled by traffic splitting.
The Scoring Script
The scoring script is the glue between the endpoint and your pickle bundle. Azure ML calls init() once when the container starts, and run() on every incoming request.
# score.py — deployed alongside the model
import json
import pickle
import os
import numpy as np
import pandas as pd
def init():
"""Called once when the endpoint container starts."""
global model_bundle
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model_bundle.pkl")
with open(model_path, "rb") as f:
model_bundle = pickle.load(f)
print(f"Model loaded. Trained at: {model_bundle['metadata']['trained_at']}")
print(f"Expected features: {model_bundle['feature_order']}")
def run(raw_data):
"""Called on every scoring request."""
try:
data = json.loads(raw_data)
df = pd.DataFrame(data["input_data"])
# Enforce feature order from the bundle
df = df[model_bundle["feature_order"]]
# Apply the same scaler used during training
scaled = model_bundle["scaler"].transform(df)
# Predict
predictions = model_bundle["model"].predict(scaled)
return json.dumps({
"predictions": predictions.tolist(),
"model_version": model_bundle["metadata"].get("scikit_learn_version", "unknown"),
})
except KeyError as e:
return json.dumps({"error": f"Missing expected column: {e}"})
except Exception as e:
return json.dumps({"error": str(e)})
Key things to notice:
- AZUREML_MODEL_DIR — Azure ML automatically downloads the model artifact from the registry and sets this environment variable to the local path. You never deal with storage URLs in scoring code.
- Feature order enforcement — df[model_bundle["feature_order"]] ensures columns are in the exact order the model was trained on, even if the caller sends them in a different order.
- Same scaler — The StandardScaler from the bundle is reused, so the numerical scaling matches training exactly. This is why we bundle the scaler with the model.
The Deploy Stage in the Pipeline
Add this stage after the Register stage. All endpoint and deployment configuration is done inline via az ml CLI parameters — no separate YAML config files needed:
- stage: Deploy
dependsOn: Register
displayName: 'Deploy to Managed Endpoint'
jobs:
- job: DeployModel
steps:
- checkout: self # to access score.py
- task: AzureCLI@2
displayName: 'Create or update endpoint'
inputs:
azureSubscription: '<your-service-connection>'
scriptType: 'ps'
scriptLocation: 'inlineScript'
inlineScript: |
az extension add -n ml --yes
# Create endpoint if it doesn't exist (idempotent)
$exists = az ml online-endpoint show `
--name <your-endpoint-name> `
--resource-group <your-resource-group> `
--workspace-name <your-workspace> 2>$null
if (-not $exists) {
az ml online-endpoint create `
--name <your-endpoint-name> `
--auth-mode key `
--resource-group <your-resource-group> `
--workspace-name <your-workspace>
}
- task: AzureCLI@2
displayName: 'Deploy model to endpoint'
inputs:
azureSubscription: '<your-service-connection>'
scriptType: 'ps'
scriptLocation: 'inlineScript'
inlineScript: |
az extension add -n ml --yes
az ml online-deployment create `
--name blue `
--endpoint-name <your-endpoint-name> `
--model azureml://registries/<your-ml-registry>/models/<your-model-name>/versions/<version-number> `
--code-path ./scoring `
--scoring-script score.py `
--environment-image mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest `
--instance-type Standard_DS3_v2 `
--instance-count 1 `
--resource-group <your-resource-group> `
--workspace-name <your-workspace> `
--all-traffic
- task: AzureCLI@2
displayName: 'Smoke test the endpoint'
inputs:
azureSubscription: '<your-service-connection>'
scriptType: 'ps'
scriptLocation: 'inlineScript'
inlineScript: |
az extension add -n ml --yes
# Send a test request to verify the deployment is healthy
az ml online-endpoint invoke `
--name <your-endpoint-name> `
--resource-group <your-resource-group> `
--workspace-name <your-workspace> `
--request-file scoring/sample-request.json
Version pinning is critical. The scikit-learn version in your scoring environment must match the version used during training. Pickle deserialization can fail or produce wrong results if the versions differ.
Deploy Stage Placeholder Reference
| Placeholder | Description | Example |
|---|---|---|
| <your-endpoint-name> | Unique endpoint name (DNS-safe) | anomaly-scoring-endpoint |
| <your-workspace> | Azure ML Workspace name | ml-workspace-prod |
Complete Pipeline — All Four Stages
Here's the full pipeline structure showing how Train, Register, and Deploy connect:
stages:
- stage: DevOps # Gate
- stage: Train # Train model → publish pickle artifact
dependsOn: DevOps
- stage: Register # Register pickle in Azure ML Registry
dependsOn: Train
- stage: Deploy # Deploy to Managed Online Endpoint
dependsOn: Register
# Optional: add a manual approval gate here
# condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
Each stage is independently retriable. If Deploy fails, you don't retrain or re-register — you just redeploy.
Extending This Template
Once the base pipeline is working, consider these additions:
- Model validation stage — Add a stage between Register and Deploy that runs the model against a holdout set and gates deployment on a minimum performance threshold.
- Batch scoring pipeline — A separate pipeline or Azure Function loads the model from the registry and scores large datasets on a schedule using Azure ML Batch Endpoints.
- Monitoring — Use Azure ML model monitoring to track data drift and prediction distributions over time. Trigger retraining automatically when drift exceeds a threshold.
- Multi-environment promotion — Register to a dev registry first, deploy to a staging endpoint, run integration tests, then promote to production.
- A/B testing — Use traffic splitting to evaluate a new model version against the current one on live traffic before committing.
Conclusion
An end-to-end MLOps pipeline doesn't need to be complex. The core pattern is:
- Train — Run the training script, serialize the model bundle
- Register — Push to Azure ML Registry with automatic versioning
- Deploy — Create/update a Managed Online Endpoint with the new version
- Score — Clients call a standard HTTPS API, the endpoint handles scaling
The value comes from making this repeatable and removing manual steps. Every push to main trains a fresh model, registers it, and deploys it to a live endpoint — with a rollback path through blue-green deployments if anything goes wrong.
Copy this template, replace the <your-...> placeholders, write your training script and scoring script, and you have a production-grade MLOps pipeline. The structure stays the same regardless of whether you're deploying an anomaly detector, a classifier, or a regression model.