Federated learning is an innovative approach to machine learning for compliance. It enables multiple organizations to come together and train better quality models, while helping them to achieve their respective data privacy and security standards. In a nutshell, federated learning consists in training a model partially within distinct trust boundaries (countries, institutions, companies, tenants) also called silos, then the partial models are aggregated centrally in an orchestrator. This process is repeated between silos and orchestrator until convergence and generalization is achieved.
Our solutions harness the potential of federated learning by combining the advanced capabilities of Azure for provisioning flexible infrastructures for the silos, and Azure Machine Learning for the orchestration of training at scale. This also integrates with important features such as Azure confidential computing and encryption at rest, differential privacy, pushing higher the standards for confidential ML.
Direct links to action
To skip and jump to direct hands-on experience, check the links below:
Federated learning unblocks complex industry scenarios
The federated learning paradigm is flexible and can tackle different organizational scenarios where traditional ML would be blocked. Let’s take two common use cases.
Use case 1 - One company, multiple trust boundaries - A company has data in distinct regions across the globe, each with their own regulation restricting the circulation of data. Up to now, they had to train local models only, but this shows some limits to generalization. This company wants to harness all this data in its original location, but still achieve better results than with training local models.
Solution: They create an AzureML workspace for orchestration. This workspace has multiple compute resources and datastores, each located in a distinct region, within a given trust boundary. Data scientists will use this workspace to run federated learning experiments: the model training happens in parallel on data from region A on a compute from region A, and so on for all the regions they have data in. The model will be transferred back to the region of the orchestrator for aggregation. This will happen iteratively through multiple cycles or training/aggregation until the model converges and will perform better than a model trained in a single region alone.
Use case 2 - Multiple organizations, each having their respective trust boundary both in-cloud or on-prem – Multiple organizations (hospitals, banks) come together as a federation to tackle a common machine learning problem (ex: genomics, fraud detection). They hope that by enabling ML training to happen on all their data combined, they’ll achieve better models and bring more innovative solutions to their industry.
Solution: The federation creates an AzureML workspace where the data scientists from those organizations will be able to run their jobs as a collaborative team. The workspace will connect with computes hosted and maintained by each organization, some in their respective tenant, some on their own HPC on-premises. The ML training will happen locally within each organization, the model will be sent back to the federation for aggregation and iteration, metrics will be displayed in the workspace for the team to collaborate on.
AzureML SDK v2 to easily write federated learning pipelines
The AzureML SDK v2 provides the foundation to implement a federated learning ML pipeline. In the example below, the pipeline first trains a model on 3 distinct computes and datasets independently (three trust boundaries). Those steps are then aggregated to produce a single model in the orchestrator compute. This process repeats multiple time for convergence.
Because the entire FL pipeline is a regular AzureML experiment, it also integrates with MLFlow for metrics reporting (see below), and unlocks all the usual benefits of AzureML as a platform for experiment management, model deployment and monitoring, etc.
Our FL accelerator repository provides multiple examples of pipeline and training code you can use as a starting point for developing your own:
Heterogeneous infrastructure needs can be met within a single experience
Different scenarios will require different provisioning strategies. But Azure provides enough flexibility to cover many use cases of federated learning: silo compute and data can be in one single Azure tenant, in different tenants through AKS computes, or external to Azure entirely and on-premises through Azure Arc. All those requirements can be met and attached to a single AzureML workspace used as one entry point for a team of data scientists. Independently of the provisioning setup, the data science experience will remain the same as the team will leverage the AzureML SDK to create and run their experiments.
The example infrastructure schema below is a simple use case where everything fits within a single tenant. The resources for each trust boundary (orchestrator, silos) will be independent and contained from one another by provisioning them with distinct identities, virtual networks, private links etc.
Our FL accelerator repository provides ready-to-deploy sandboxes for your team to get started and evaluate applicability to your specific setup. Our provisioning guide will provide a pick-and-choose approach to designing an infrastructure tailored to your needs.
Your provisioning strategy can also leverage other very complementary Azure capabilities such as confidential computing, where each orchestrator/silo compute is based on confidential virtual machines, leverage encryption-at-rest and Managed HSM for key management.
Learn more
To stay updated on Azure Machine Learning announcements, watch our breakout sessions from Microsoft Build.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.