Microsoft Foundry Blog

10 MIN READ

Simplifying Image Classification with Azure AutoML for Images: A Practical Guide

Serge_Retkowsky

Microsoft

Apr 14, 2026

Building production-ready computer vision models has never been easier. Here’s how Azure AutoML transforms image classification from complex to simple. Azure AutoML for Images is available with Azure ML.

1. The Challenge of Traditional Image Classification

Anyone who has worked with computer vision knows the drill: you need to classify images, so you dive into TensorFlow or PyTorch, spend days architecting a convolutional neural network, experiment with dozens of hyperparameters, and hope your model generalizes well. It’s time-consuming, requires deep expertise, and often feels like searching for a needle in a haystack.

What if there was a better way?

2. Enter Azure AutoML for Images

Azure AutoML for Images is a game-changer in the computer vision space. It’s a feature within Azure Machine Learning that automatically builds high-quality vision models from your image data with minimal code. Think of it as having an experienced ML engineer working alongside you, handling all the heavy lifting while you focus on your business problem.

What Makes AutoML for Images Special?

1. Automatic Model Selection

Instead of manually choosing between ResNet, EfficientNet, or dozens of other architectures, AutoML for Images (Azure ML) evaluates multiple state-of-the-art deep learning models and selects the best one for your specific dataset. It’s like having access to an entire model zoo with an intelligent curator.

2. Intelligent Hyperparameter Tuning

The system doesn’t just pick a model — it optimizes it. Learning rates, batch sizes, augmentation strategies, and more are automatically tuned to squeeze out the best possible performance. What would take weeks of manual experimentation happens in hours.

3. Built-in Best Practices

Data preprocessing, augmentation techniques, and training strategies that would require extensive domain knowledge are pre-configured and applied automatically. You get enterprise-grade ML without needing to be an ML expert.

Key Capabilities

The repository demonstrates several powerful features:

Multi-class and Multi-label Classification: Whether you need to classify an image into a single category or tag it with multiple labels, AutoML manages both scenarios seamlessly.
Format Flexibility: Works with standard image formats including JPEG and PNG, making it easy to integrate with existing datasets.
Full Transparency: Unlike black-box solutions, you maintain complete visibility and control over the training process. You can monitor metrics, understand model decisions, and fine-tune as needed.
Production-Ready Deployment: Once trained, models can be easily deployed to Azure endpoints, ready to serve predictions at scale.

Real-World Applications

The practical applications are vast:

E-commerce: Automatically categorize product images for better search and recommendations.
Healthcare: Classify medical images for diagnostic support.
Manufacturing: Detect defects in production line images.
Agriculture: Identify crop diseases or estimate yield from aerial imagery.
Content Moderation: Automatically flag inappropriate visual content.

3. A Practical Example: Metal Defect Detection

The repository includes a complete end-to-end example of detecting defects in metal surfaces — a critical quality control task in manufacturing. The notebooks demonstrate how to:

Download and organize image data from sources like Kaggle,
Create training and validation splits with proper directory structure,
Upload data to Azure ML as versioned datasets,
Configure GPU compute that scales based on demand,
Train multiple models with automated hyperparameter tuning,
Evaluate results with comprehensive metrics and visualizations,
Deploy the best model as a production-ready REST API,
Export to ONNX for edge deployment scenarios.

The metal defect use case is particularly instructive because it mirrors real industrial applications where quality control is critical but expertise is scarce. The notebooks show how a small team can build production-grade computer vision systems without a dedicated ML research team.

Getting Started: What You Need

The prerequisites are straightforward:

An Azure subscription (free tier available for experimentation)
An Azure Machine Learning workspace
Python 3.7 or later

That’s it. No local GPU clusters to configure, no complex deep learning frameworks to master.

Repository Structure

The repository is thoughtfully organized into three progressive notebooks:

Downloading images.ipynb

Shows how to acquire and prepare image datasets
Demonstrates proper directory structure for classification tasks
Includes data exploration and visualization techniques

image-classification-azure-automl-for-images/1. Downloading images.ipynb at main · retkowsky/image-classification-azure-automl-for-images

Azure ML AutoML for Images.ipynb

The core workflow: connect to Azure ML, upload data, configure training
Covers both simple model training and advanced hyperparameter tuning
Shows how to evaluate models and select the best performing one
Demonstrates deployment to managed online endpoints

image-classification-azure-automl-for-images/2. Azure ML AutoML for Images.ipynb at main · retkowsky/image-classification-azure-automl-for-images

Edge with ONNX local model.ipynb

Exports trained models to ONNX format
Shows how to run inference locally without cloud connectivity
Perfect for edge computing and IoT scenarios

image-classification-azure-automl-for-images/3. Edge with ONNX local model.ipynb at main · retkowsky/image-classification-azure-automl-for-images

Each Python notebook is self-contained with clear explanations, making it easy to understand each step of the process. You can run them sequentially to build a complete solution, or jump to specific sections relevant to your use case.

The Developer Experience

What sets this approach apart is the developer experience. The repository provides Python notebooks that guide you through the entire workflow. You’re not just reading documentation — you’re working with practical, runnable examples that demonstrate real scenarios.

Let’s walk through the code to see how straightforward this actually is.

Use-case description

This image classification model is designed to identify and classify defects on metal surfaces in a manufacturing context.

We want to classify defective images into Crazing, Inclusion, Patches, Pitted, Rolled & Scratches.

Press enter or click to view image in full size

All code and images are available here: retkowsky/image-classification-azure-automl-for-images: Azure AutoML for images — Image classification

Step 1: Connect to Azure ML Workspace

First, establish connection to your Azure ML workspace using Azure credentials:

print("Connection to the Azure ML workspace…")
credential = DefaultAzureCredential()

ml_client = MLClient(
  credential,
  os.getenv("subscription_id"),
  os.getenv("resource_group"),
  os.getenv("workspace")
)

print("✅ Done")

That’s it.

Step 2: Upload Your Dataset

Upload your image dataset to Azure ML. The code handles this elegantly:

my_images = Data(
  path=TRAIN_DIR,
  type=AssetTypes.URI_FOLDER,
  description="Metal defects images for images classification",
  name="metaldefectimagesds",
)

uri_folder_data_asset = ml_client.data.create_or_update(my_images)

print("🖼️ Informations:")
print(uri_folder_data_asset)
print("\n🖼️ Path to folder in Blob Storage:")
print(uri_folder_data_asset.path)

Your local images are now versioned data assets in Azure, ready for training.

Step 3: Create GPU Compute Cluster

AutoML needs compute power. Here’s how you create a GPU cluster that auto-scales:

compute_name = "gpucluster"

try:
  _ = ml_client.compute.get(compute_name)
  print("✅ Found existing Azure ML compute target.")

except ResourceNotFoundError:
  print(f"🛠️ Creating a new Azure ML compute cluster '{compute_name}'…")
  compute_config = AmlCompute(
   name=compute_name,
   type="amlcompute",
   size="Standard_NC16as_T4_v3", # GPU VM
   idle_time_before_scale_down=1200,
   min_instances=0, # Scale to zero when idle
   max_instances=4,
)
 
ml_client.begin_create_or_update(compute_config).result()
print("✅ Done")

The cluster scales from 0 to 4 instances based on workload, so you only pay for what you use.

Step 4: Configure AutoML Training

Now comes the magic. Here’s the entire configuration for an AutoML image classification job using a specific model (here a resnet34). It is possible as well to access all the available models from the image classification AutoML library.

Press enter or click to view image in full size

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2&tabs=python#supported-model-architectures

image_classification_job = automl.image_classification(
  compute=compute_name,
  experiment_name=exp_name,
  training_data=my_training_data_input,
  validation_data=my_validation_data_input,
  target_column_name="label",
)

# Set training parameters
image_classification_job.set_limits(timeout_minutes=60)
image_classification_job.set_training_parameters(model_name="resnet34")

That’s approximately 10 lines of code to configure what would traditionally require hundreds of lines and deep expertise.

Step 5: Hyperparameter Tuning (Optional)

Want to explore multiple models and configurations?

image_classification_job = automl.image_classification(
    compute=compute_name,  # Compute cluster
    experiment_name=exp_name,  # Azure ML job
    training_data=my_training_data_input,  # Training
    validation_data=my_validation_data_input,  # Validation
    target_column_name="label",  # Target
    primary_metric=ClassificationPrimaryMetrics.ACCURACY,  # Metric
    tags={"usecase": "metal defect", "type" : "computer vision", 
"product" : "azure ML", "ai": "image classification", "hyper": "YES"},
)

image_classification_job.set_limits(
    timeout_minutes=60,  # Timeout in min
    max_trials=5,  # Max model number
    max_concurrent_trials=2,  # Concurrent training
)

image_classification_job.extend_search_space([
    SearchSpace(
        model_name=Choice(["vitb16r224", "vits16r224"]),
        learning_rate=Uniform(0.001, 0.01),  # LR
        number_of_epochs=Choice([15, 30]),  # Epoch
    ),
    SearchSpace(
        model_name=Choice(["resnet50"]),
        learning_rate=Uniform(0.001, 0.01),  # LR
        layers_to_freeze=Choice([0, 2]),  # Layers to freeze
    ),
])

image_classification_job.set_sweep(
    sampling_algorithm="Random",  # Random sampling to select combinations of hyperparameters. 
    early_termination=BanditPolicy(evaluation_interval=2,  # The model is evaluated every 2 iterations.
                                   slack_factor=0.2,  # If a run’s performance is 20% worse than the best run so far, it may be terminated.
                                   delay_evaluation=6),  # The policy waits until 6 iterations have completed before starting to 
                                                         # evaluate and potentially terminate runs.
)

AutoML will now automatically try different model architectures, learning rates, and augmentation strategies to find the best configuration.

Step 6: Launch Training

Submit the job and monitor progress:

# Submit the job
returned_job = ml_client.jobs.create_or_update(image_classification_job)
print(f"✅ Created job: {returned_job}")

# Stream the logs in real-time
ml_client.jobs.stream(returned_job.name)

While training runs, you can monitor metrics, view logs, and track progress through the Azure ML Studio UI or programmatically.

Step 7: Results

Step 8: Deploy to Production

Once training completes, deploy the best model as a REST endpoint:

# Create endpoint configuration
online_endpoint_name = "metal-defects-classification"

endpoint = ManagedOnlineEndpoint(
  name=online_endpoint_name,
  description="Metal defects image classification",
  auth_mode="key",
  tags={
    "usecase": "metal defect", 
    "type": "computer vision"
   },
)

# Deploy the endpoint
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Your model is now a production API endpoint, ready to classify images at scale.

Beyond the Cloud: Edge Deployment with ONNX

One of the most powerful aspects of this approach is flexibility in deployment. The repository includes a third notebook demonstrating how to export your trained model to ONNX (Open Neural Network Exchange) format for edge deployment.

This means you can:

Deploy models on IoT devices for real-time inference without cloud connectivity
Reduce latency by processing images locally on edge hardware
Lower costs by eliminating constant cloud API calls
Ensure privacy by keeping sensitive images on-premises

The ONNX export process is straightforward and integrates seamlessly with the AutoML workflow. Your cloud-trained model can run anywhere ONNX Runtime is supported — from Raspberry Pi devices to industrial controllers.

import onnxruntime

# Load the ONNX model
session = onnxruntime.InferenceSession("model.onnx")

# Run inference locally
results = session.run(None, {input_name: image_data})

This cloud-to-edge workflow is particularly valuable for manufacturing, retail, and remote monitoring scenarios where edge processing is essential.

Interactive webapp for image classification

Interpreting model predictions

Deployed endpoint returns base64 encoded image string if both model_explainability and visualizations are set to True.

Why This Matters?

In the AI era, the competitive advantage isn’t about who can build the most complex models — it’s about who can deploy effective solutions fastest. Azure AutoML for Images democratizes computer vision by making sophisticated ML accessible to a broader audience.

Small teams can now accomplish what previously required dedicated ML specialists. Prototypes that took months can be built in days. And the quality? Often on par with or better than manually crafted solutions, thanks to AutoML’s systematic approach and access to cutting-edge techniques.

What the Code Reveals

Looking at the actual implementation reveals several important insights:

Minimal Boilerplate: The entire training pipeline — from data upload to model deployment — requires less than 50 lines of meaningful code. Compare this to traditional PyTorch or TensorFlow implementations that often exceed several hundred lines.
Built-in Best Practices: Notice how the code automatically manages concerns like data versioning, experiment tracking, and compute auto-scaling. These aren’t afterthoughts — they’re integral to the platform.
Production-Ready from Day One: The deployed endpoint isn’t a prototype. It includes authentication, scaling, monitoring, and all the infrastructure needed for production workloads. You’re building production systems, not demos.
Flexibility Without Complexity: The simple API hides complexity without sacrificing control. Need to specify a particular model architecture? One parameter. Want hyperparameter tuning? Add a few lines. The abstraction level is perfectly calibrated.
Observable and Debuggable: The `.stream()` method and comprehensive logging mean you’re never in the dark about what’s happening. You can monitor training progress, inspect metrics, and debug issues — all critical for real projects.

The Cost of Complexity

Traditional ML projects fail not because of technology limitations but because of complexity. The learning curve is steep, the iteration cycles are long, and the resource requirements are high. By abstracting away this complexity, AutoML for Images changes the economics of computer vision projects.

You can now:

Validate ideas quickly: Test whether image classification solves your problem before committing significant resources
Iterate faster: Experiment with different approaches in hours rather than weeks
Scale expertise: Enable more team members to work with computer vision, not just ML specialists

Conclusion

Image classification is a fundamental building block for countless AI applications. Azure AutoML for Images makes it accessible, practical, and production-ready. Whether you’re a seasoned data scientist looking to accelerate your workflow or a developer taking your first steps into computer vision, this approach offers a compelling path forward.

The future of ML isn’t about writing more complex code — it’s about writing smarter code that leverages powerful platforms to deliver business value faster. This repository shows you exactly how to do that.

Practical Tips from the Code

After reviewing the notebooks, here are some key takeaways for your own projects:

Start with a Single Model: The basic configuration with `model_name=”resnet34"` is perfect for initial experiments. Only move to hyperparameter sweeps once you’ve validated your data and use case.
Use Tags Strategically: The code demonstrates adding tags to jobs and endpoints (e.g., `”usecase”: “metal defect”`). This becomes invaluable when managing multiple experiments and models in production.
Leverage Auto-Scaling: The compute configuration with `min_instances=0` means you’re not paying for idle resources. The cluster scales up when needed and scales down to zero when idle.
Monitor Training Live: The `ml_client.jobs.stream()` method is your best friend during development. You see exactly what’s happening and can catch issues early.
Version Your Data: Creating named data assets (`name=”metaldefectimagesds”`) means your experiments are reproducible. You can always trace back which data version produced which model.
Think Cloud-to-Edge: Even if you’re deploying to the cloud initially, the ONNX export capability gives you flexibility for future edge scenarios without retraining.