Blog Post

AI - Machine Learning Blog
2 MIN READ

Model training and Fine Tuning with serverless compute

vijetaj's avatar
vijetaj
Icon for Microsoft rankMicrosoft
Nov 15, 2023

We are happy to announce the General Availability of Model Training with Serverless Compute.

 

Serverless compute is a fully-managed, on-demand compute target for a simplified way of running training jobs in Azure Machine Learning. Through serverless compute, machine learning (ML) professionals can focus on their expertise in building ML models, rather than learning about compute infrastructure. Serverless compute also reduces the management burden on IT admins by managing the compute infrastructure and providing managed network isolation, while still meeting the most stringent enterprise security requirements. All Azure Machine Learning job types are supported, including generative AI scenarios such as fine-tuning, evaluations, and retrieval augmented generation (RAG) for large language models.

 

 

Advantages of serverless compute

  • Azure Machine Learning manages creating, setting up, scaling, deleting, and patching for compute infrastructure, reducing management overhead on IT admins
  • No need for enterprises to perform repetitive processes to create compute using the same settings for each workspace
  • Simplifies the job submission experience by reducing the steps involved to run a job
  • ML professionals don’t need to learn about compute concepts, various compute types, or related properties and instead can just focus on the job specification
  • Dynamic defaulting of VM size needed to run the training job
  • Meets the most stringent enterprise security requirements by providing support for No public IP compute, private link workspaces, customer virtual network, managed virtual network, managed identity, and user identity. Admin control through quota and Azure policies.
  • Enterprises can optimize costs by specifying the exact resources each job needs at runtime. Utilization metrics of the job can be monitored to optimize the resources a job would need. Low-priority VMs are also supported.
  • Elastic training support in case of quota, low-priority, and fault tolerance scenarios
  • Reduced wait times before jobs start executing in some cases

Get Started

Updated Nov 19, 2023
Version 2.0
  • edgBR's avatar
    edgBR
    Copper Contributor

    Does it work when the network isolation is not managed?