Transitioning legacy ACI inference web services to managed online endpoints

Former Employee

Sep 16, 2022

Model deployment is one of the most critical components in machine learning systems. Model deployment in Azure Machine Learning (AzureML) is evolving. So far, AzureML supports Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) as traditional seamless deployment targets for models.

Recently, at Build 2022, we released managed online endpoints to provide a unified interface to invoke and manage model deployments on Microsoft-managed compute in a turnkey manner. You can take advantage of scalable and reliable endpoints without being concerned about infrastructure management. Already, our several customers and partners are utilizing the inference capability to automate model deployments toward production use.

The developer experience in AzureML is also evolving. The AzureML CLI v2 and Python SDK v2 no longer support legacy ACI web services. Upgrade to v2 is highly recommended to take full advantage of the consistency and new features to accelerate the machine learning lifecycle in v2 production environments.

In this blog, we summarize the benefits of leveraging the managed online endpoints, cost comparison, and introduce how you can transition from your existing ACI workloads to managed online endpoints.

*As of September 2022, ACI web services are in maintenance mode and will not be invested in new features.

*AzureML CLI v1 is getting retired on 30 Sep 2025, see CLI & SDK v2 for details.

What managed online endpoints bring you

Managed online endpoints handle serving, scaling, securing, and monitoring of your machine learning models without being concerned about the underlying infrastructure. In particular, the recommended deployment purpose of ACI web services was for dev/test environments, while managed online endpoints is designed for use in production environments.

Here are some benefits of using managed online endpoints:

Optimizing cost

Wider options for VM SKUs, GPU optimized inference with Triton, more scalable than ACI
Autoscaling by schedule based, metrics-based, and combinations
View costs at endpoint and deployment level

Streamline development

Declarative deployment with YAML, easy to use for GitOps
Locally debug deployment code & dependencies with VS Code
Multiple deployments with different traffic settings

Streamline operations

Managed infrastructure, security enhancements including network isolation
Safe rollout of new deployment and controlled rollout of in-place update
Logs, application diagnostics & advanced performance monitoring

Upgrade guidance

There are two approaches to upgrade:

Deploy to managed online endpoints by yourself using the model and environment you deployed to ACI.

You can use AzureML CLI v2, Python SDK v2, and REST API to deploy your models for managed online endpoints. This is highly recommended for customers who regularly create and delete ACI services.

Use upgrade tools.

We provide documents and scripts to support upgrade. This tool will automatically create new online endpoint, your original services won't be affected. You can safely route the traffic to the new endpoint and then delete the old one.

There are a few things to note when upgrading from ACI web service:

The scoring URL will change. For example, the scoring URL for ACI web services was like http://aaaaaa-bbbbb-1111.westus.azurecontainer.io/score, but for managed online endpoints, it will be like https://endpoint-name.westus.inference.ml.azure.com/score
As AzureML CLI v1 and Python SDK v1 no longer support managed online endpoints operations, please use the CLI/SDK v2 or REST APIs.
For legacy ACI model deployment, you can specify CPU/Memory requirements. For managed online endpoints, you can only specify VM SKUs to be used.

Please refer to the following example of mapping CPU/Memory to corresponding SKUs.

Table 1. Suggested VM SKU for different resource requirements of ACI web services.

ACI resource requirements		Suggested SKU
CPU	Memory (GB)	Suggested SKU
(0, 1]	(0, 1.2]	DS1 V2
(1, 2]	(1.2,1.7]	F2s V2
	(1.7, 4.7]	DS2 V2
	(4.7, 13.7]	E2s V3
(2, 4]	(0, 5.7]	F4s V2
	(5.7, 11.7]	DS3 V2
	(11.7, 16]	E4s V3

* "(" means greater than and "]" means less than or equal to. For example, “(0, 1]” means “greater than 0 and less than or equal to 1”.

Cost comparison

When upgrading from ACI, it's important to note that there will be some changes in how you'll be charged. Please use the information here to help you choose the right VM SKUs for your workload.

You can also take advantage of Reserved Instances to reduce costs if you anticipate steady usage over a period of time (one-year or three-year).

Table 2. Approximate cost comparison of ACI web services and managed online endpoints (example for East US 2 region, USD$).

ACI resource requirements		ACI costs Range / Per month (USD$)	Suggested SKU	SKU costs (USD$)
CPU cores	Memory (GB)			Pay as you go / Per month	1 year reserved / Per month	3 years reserved / Per month
(0, 1]	(0, 1.2]	($29.565, $33.463]	DS1 V2	$41.610	$27.003	$17.696
(1, 2]	(1.2,1.7]	($63.028, $64.652]	F2s V2	$61.758	$36.500	$22.638
	(1.7, 4.7]	($64.652, $74.398]	DS2 V2	$83.220	$54.086	$35.391
	(4.7, 13.7]	($74.398, $103.634]	E2s V3	$97.090	$57.086	$36.500
(2, 4]	(0, 5.7]	($88.695, $107.211] 3 cores	F4s V2	$123.37	$73.000	$45.275
		($118.26, $136.776] 4 cores
	(5.7, 11.7]	($107.211, $126.702] 3 cores	DS3 V2	$167.170	$108.165	$70.781
		($136.776, $156.267] 4 cores
	(11.7, 16]	($126.702, $140.671] 3 cores	E4s V3	$194.180	$114.165	$73.000
		($156.267, $170.236] 4 cores

* Azure costs differ based on the region you use and may change, please refer to the latest pricing.

* ACI cost is calculated by 29.5650 * X + 3.2485 * Y. (X is the CPU core request rounded up to the nearest number, Y is the memory GB request rounded up to the nearest tenths place)