Recent Blog ArticlesMost RecentMost LikesScaling Speech, Language and Vision Models with Mixture of Experts Technique Authors: Devang Patel (devang_patel), Wei Zuo (weizuo), Yu Shi (yu3shi2), Kenichi Kumatani (kekumata), Mengchen Liu (MengchenLiu), Robert Gmyr (rogmyr) and Kshama Pawar (kshama-msft) Introducti...Accelerate PyTorch transformer model training with ONNX Runtime – a deep dive Get a closer look at how ONNX Runtime (ORT) for PyTorch brings about significant throughput improvements for large scale transformer model training.