Join Microsoft and NVIDIA experts at this GTC Session [WP41730]: Operationalize Large-Model Training on Azure Machine Learning using NVIDIA’s Multi-Node A100 GPUs. A GTC Session Watch Party is a replay of an original GTC talk. This is an interactive session, and we encourage you to join the discussion with any comments or questions.
Watch Party #1: This session takes place on Wednesday, Sep 21, 3:00PM- 3:30 PM PDT (NALA) and is hosted by:
Watch Party #2 This session takes place on Thursday, Sep 22, 2:00 PM - 3:30 PM CEST and is hosted by:
- Gabrielle Davelaar, AI Technical Specialist, Microsoft
- Maxim Salnikov, Senior Azure GTM Manager, Microsoft
- Henk Boelman, Senior Cloud Advocate–AI & Machine Learning, Microsoft
- Alexander Young, Technical Marketing Engineer, NVIDIA
- Ulrich Knechtel, Microsoft Partner Manager (EMEA), NVIDIA
Deep learning models have grown in size by several orders of magnitude in recent years, demonstrating a growing need for customers to train and fine-tune them using large-scale infrastructure with many GPUs and requiring large memory. Azure Machine Learning offers the breakthrough software stack running on the latest multi-node NVIDIA GPUs. Azure Machine Learning offers ready-to-use environments with stable PyTorch for Enterprise, including optimizers like DeepSpeed and ONNX Runtime, to enable data scientists to easily train large models.
We'll showcase experiments using 1,024 A100s to scale the training of a 2T parameter model with a streamlined user experience at 1K+ GPU scale. We'll describe the software innovations to customers through Azure Machine Learning (including a fully optimized PyTorch environment) that offers great performance and an easy-to-use interface for large-scale training. Use simple training pipelines on Azure Machine Learning (AzureML) to train large models on Azure using NVIDIA A100 Tensor Core GPUs.
Must be registered for GTC to join—free registration!