Federated Learning with Azure Machine Learning: Powering Privacy-Preserving Innovation in AI

Microsoft

May 30, 2023

Federated learning is an innovative approach to machine learning for compliance. It enables multiple organizations to come together and train better quality models, while helping them to achieve their respective data privacy and security standards. In a nutshell, federated learning consists in training a model partially within distinct trust boundaries (countries, institutions, companies, tenants) also called silos, then the partial models are aggregated centrally in an orchestrator. This process is repeated between silos and orchestrator until convergence and generalization is achieved.

Our solutions harness the potential of federated learning by combining the advanced capabilities of Azure for provisioning flexible infrastructures for the silos, and Azure Machine Learning for the orchestration of training at scale. This also integrates with important features such as Azure confidential computing and encryption at rest, differential privacy, pushing higher the standards for confidential ML.

Direct links to action

To skip and jump to direct hands-on experience, check the links below:

A github solution accelerator Federated Learning in AzureML with a quickstart to Run a Federated Learning Demo in 5 mins
A session at Microsoft Build 2023 combining Federated Learning with Azure Machine Learning, NVIDIA FLARE and MONAI
A healthcare medical imaging accelerator based on NVFlare on AzureML
A customer success story in healthcare Extracting value from siloed data using federated learning with AzureML

Federated learning unblocks complex industry scenarios

The federated learning paradigm is flexible and can tackle different organizational scenarios where traditional ML would be blocked. Let’s take two common use cases.

Use case 1 - One company, multiple trust boundaries - A company has data in distinct regions across the globe, each with their own regulation restricting the circulation of data. Up to now, they had to train local models only, but this shows some limits to generalization. This company wants to harness all this data in its original location, but still achieve better results than with training local models.

Solution: They create an AzureML workspace for orchestration. This workspace has multiple compute resources and datastores, each located in a distinct region, within a given trust boundary. Data scientists will use this workspace to run federated learning experiments: the model training happens in parallel on data from region A on a compute from region A, and so on for all the regions they have data in. The model will be transferred back to the region of the orchestrator for aggregation. This will happen iteratively through multiple cycles or training/aggregation until the model converges and will perform better than a model trained in a single region alone.

Use case 2 - Multiple organizations, each having their respective trust boundary both in-cloud or on-prem – Multiple organizations (hospitals, banks) come together as a federation to tackle a common machine learning problem (ex: genomics, fraud detection). They hope that by enabling ML training to happen on all their data combined, they’ll achieve better models and bring more innovative solutions to their industry.

Solution: The federation creates an AzureML workspace where the data scientists from those organizations will be able to run their jobs as a collaborative team. The workspace will connect with computes hosted and maintained by each organization, some in their respective tenant, some on their own HPC on-premises. The ML training will happen locally within each organization, the model will be sent back to the federation for aggregation and iteration, metrics will be displayed in the workspace for the team to collaborate on.

AzureML SDK v2 to easily write federated learning pipelines

The AzureML SDK v2 provides the foundation to implement a federated learning ML pipeline. In the example below, the pipeline first trains a model on 3 distinct computes and datasets independently (three trust boundaries). Those steps are then aggregated to produce a single model in the orchestrator compute. This process repeats multiple time for convergence.

Because the entire FL pipeline is a regular AzureML experiment, it also integrates with MLFlow for metrics reporting (see below), and unlocks all the usual benefits of AzureML as a platform for experiment management, model deployment and monitoring, etc.

Our FL accelerator repository provides multiple examples of pipeline and training code you can use as a starting point for developing your own:

Real training examples for tasks such as medical imaging classification, named entity recognition, credit card fraud detection, marketing, etc,
Example implementations for both homogeneous (horizontal) and heterogeneous (vertical) federated learning,
Example support for common FL frameworks such as NVFlare,
Introduction to Differential Privacy as a technique addressing issues such as data leakage through the model itself.

Heterogeneous infrastructure needs can be met within a single experience

Different scenarios will require different provisioning strategies. But Azure provides enough flexibility to cover many use cases of federated learning: silo compute and data can be in one single Azure tenant, in different tenants through AKS computes, or external to Azure entirely and on-premises through Azure Arc. All those requirements can be met and attached to a single AzureML workspace used as one entry point for a team of data scientists. Independently of the provisioning setup, the data science experience will remain the same as the team will leverage the AzureML SDK to create and run their experiments.

The example infrastructure schema below is a simple use case where everything fits within a single tenant. The resources for each trust boundary (orchestrator, silos) will be independent and contained from one another by provisioning them with distinct identities, virtual networks, private links etc.

Our FL accelerator repository provides ready-to-deploy sandboxes for your team to get started and evaluate applicability to your specific setup. Our provisioning guide will provide a pick-and-choose approach to designing an infrastructure tailored to your needs.

Your provisioning strategy can also leverage other very complementary Azure capabilities such as confidential computing, where each orchestrator/silo compute is based on confidential virtual machines, leverage encryption-at-rest and Managed HSM for key management.

Learn more

To stay updated on Azure Machine Learning announcements, watch our breakout sessions from Microsoft Build.

Updated May 30, 2023

Version 1.0

azure machine learning

machine learning

Jeff Omhover

Microsoft

Joined October 15, 2018

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity