Model training and Fine Tuning with serverless compute
Published Nov 15 2023 08:00 AM 843 Views

We are happy to announce the General Availability of Model Training with Serverless Compute.


Serverless compute is a fully-managed, on-demand compute target for a simplified way of running training jobs in Azure Machine Learning. Through serverless compute, machine learning (ML) professionals can focus on their expertise in building ML models, rather than learning about compute infrastructure. Serverless compute also reduces the management burden on IT admins by managing the compute infrastructure and providing managed network isolation, while still meeting the most stringent enterprise security requirements. All Azure Machine Learning job types are supported, including generative AI scenarios such as fine-tuning, evaluations, and retrieval augmented generation (RAG) for large language models.




Advantages of serverless compute

  • Azure Machine Learning manages creating, setting up, scaling, deleting, and patching for compute infrastructure, reducing management overhead on IT admins
  • No need for enterprises to perform repetitive processes to create compute using the same settings for each workspace
  • Simplifies the job submission experience by reducing the steps involved to run a job
  • ML professionals don’t need to learn about compute concepts, various compute types, or related properties and instead can just focus on the job specification
  • Dynamic defaulting of VM size needed to run the training job
  • Meets the most stringent enterprise security requirements by providing support for No public IP compute, private link workspaces, customer virtual network, managed virtual network, managed identity, and user identity. Admin control through quota and Azure policies.
  • Enterprises can optimize costs by specifying the exact resources each job needs at runtime. Utilization metrics of the job can be monitored to optimize the resources a job would need. Low-priority VMs are also supported.
  • Elastic training support in case of quota, low-priority, and fault tolerance scenarios
  • Reduced wait times before jobs start executing in some cases

Get Started

Version history
Last update:
‎Nov 18 2023 11:51 PM
Updated by: