With AzureML being the platform of choice for many PyTorch developers, we have developed the new Azure Container for PyTorch (ACPT), a curated environment to include the best of Microsoft technologies for training with PyTorch on Azure. We are excited to announce the Public Preview of ACPT within Azure Machine Learning (AzureML). This new curated environment is a lightweight, standalone environment that includes needed components to effectively run optimized training for large models on AzureML. The AzureML curated environments are available in the user’s workspace by default and are backed by cached Docker images that use the latest version of the AzureML SDK. It helps with reducing preparation costs and faster deployment time.
The ACPT curated environment expands on our existing PyTorch curated environment by including the latest PyTorch version, tested and validated against dozens of production models to ensure high quality while also providing various Microsoft technologies for training and optimization, such as and DeepSpeed. All components are already installed and validated to reduce setup costs and training time for customers.
ACPT curated environment properties and description.
Benefits of using the ACPT curated environment include:
- Optimized Training framework to set up, develop, accelerate PyTorch model on large workloads.
- Up-to-date stack with the latest compatible versions of Ubuntu, Python, PyTorch, CUDA\RocM, etc.
- Ease of use: All components installed and validated against dozens of Microsoft workloads to reduce setup costs and accelerate time to value
- Latest Training Optimization Technologies: Onnx / Onnx Runtime / Onnx Runtime Training, ORT MoE, DeepSpeed, MSCCL, and others..
- Integration with Azure ML: Track your PyTorch experiments on ML Studio or using the AML SDK
- As-IS use with pre-installed packages or build on top of the curated environment
- The image is also available as a DSVM
- Azure Customer Support
Metrics and Data
ACPT curated environment allows our customers to efficiently train PyTorch models. The optimization libraries like ONNX RunTime and DeepSpeed composed within the container can increase production speed up from 54% to 163% over regular PyTorch workloads as seen on various HuggingFace Models.
Metrics for HuggingFace Models
ACPT reduces compute costs by performing the same training jobs in significantly less time. Training runs are also easily trackable as the curated environment is integrated with AzureML tools such as the Azure Machine Learning Studio and the AzureML SDK.
Here is an example of an NLP products review finetuning training run, taking approximately two hours to complete.
Using the ACPT curated environment, we observe a decrease in the overall training time to just over
Quotes from Customers
“ACPT images help us successfully run large scale jobs on over 100 GPUs. It provides integration of different acceleration packages, such as OFED and APEX. More importantly, it alleviates our engineering efforts on building the images. It enables us to focus more on the research work.”
- Project Florence, a Microsoft AI Cognitive Services initiative
“ACPT environment helps us in running our model with Distributed Data-Parallel (DDP) implementation. We were able to successfully run the model with multi nodes and get the results effectively and efficiently. When using an environment other than ACPT, we were not able to achieve that. “- Orlando Ribas Fernandes, Co-Founder and CEO, Fashable
Learn more about the use case of Fashable here. You can also try Fashable's solution and get the limited edition of AI T-shirts on their website.
Get Started Today
Set up a curated environment with Azure Container for PyTorch
Start Azure Machine Learing for free
Learn more about PyTorch on Azure
Also, watch Microsoft Ignite 2022 to get familiar more with Azure Container for PyTorch and other Azure Mahine Learning announcements.
For additional details see: Curated environments - Azure Machine Learning | Microsoft Learn