Microsoft and Hugging Face deepen generative AI partnership

Microsoft

May 21, 2024

We are dedicated to our mission of empowering generative AI developers and global organizations with the best AI infrastructure, open and proprietary foundation models, AI orchestration and developer tools as you build and scale your copilot stacks.

Microsoft Build 2024: Deepening our generative AI partnership with Hugging Face

Today, Microsoft is thrilled to announce our deepened generative AI partnership with Hugging Face! Building on our previous collaborations. Our extended partnership with Hugging Face underscores our joint commitment to making AI more accessible through our product updates and integrations with the Azure AI Model Catalog, the latest Azure infrastructure in partnership with AMD, VS Code and HuggingChat.

By combining Microsoft's robust cloud infrastructure with Hugging Face's most popular Large Language Models (LLMs), we are enhancing our copilot stacks to provide developers with advanced tools and models to deliver scalable, responsible, and safe generative AI solutions for custom business need.

Today, at Microsoft Build 2024, we're excited to deepen our collaboration in four key areas:

We are introducing 20 new leading open Hugging Face models - like Rhea-72B-v0.5 from David Kim and Multiverse-70B from MTSAIR - into the Azure AI model catalog, further diversifying open model choice and selection for our customers.
In partnership with AMD, we are enhancing Azure AI infrastructure to support the availability of Hugging Face Hub on the ND MI300X v5, powered by AMD GPUs.
Our integration of Phi-3-mini in HuggingChat marks a significant enhancement in our interactive AI offerings.
Our integration of Hugging Face Spaces with Visual Studio Code makes the development processes streamlined for AI developers.

Hugging Face is the creator of Transformers, a widely popular library for building large language models. In 2022, we announced our partnership with Hugging Face to integrate state-of-the-art AI capabilities into Azure, increasing efficiency for developers to deploy, operationalize and manage models. As we progress, our partnership continues to deepen, in service of our joint mission to ensure useful and seamless integrations that enable generative AI developers, machine learning engineers and data scientists to deploy open models of their choice to secure and scalable inference infrastructure on Azure.

Azure AI Model Catalog: 20 new leading open-source Hugging Face models added

The Azure AI Model Catalog is the hub to discover, deploy and fine-tune the widest selection of open source and proprietary generative AI models for your use cases, RAG applications, and agents. In addition to other model providers like Cohere, Meta, and Mistral, the Hugging Face collection has a wide selection of base and fine-tuned models, like tiiuae-falcon-7b.

Today, we’re adding 20 new popular open models either trending or from the OpenLLM leaderboard - like Smaug-72B-v0.1 from Abacus AI and a Japanese/English text generation model Fugaku-LLM-13B from Fugaku-LLM - from the Hugging Face hub to the Azure AI model catalog. These models offer user experiences when deployed from the model catalog for inferencing. This premium experience is driven by advanced features, software, or optimizations. For example, some of the new models are supported by Hugging Face’s Text Generation Inference (TGI) or Text Embedding Inference (TEI) – optimized inference runtimes for efficient deployment and serving of LLMs and embeddings models respectively.

TGI enables high performance text generation of LLMs like Falcon and StarCoder through tensor parallelism, continuous batching of incoming requests, and optimized transformers code using Flash Attention and Paged Attention. TEI achieves efficient serving of text embeddings models – like BERT and its variants – through skipping the model graph compilation step in deployment and incorporating token-based dynamic batching. For more information, see Hugging Face’s articles and list of supported models for TGI and TEI.

In partnership with the engineering team at Hugging Face, we plan to deepen our integrations between Hugging Face hub and Azure AI on tenants of model discoverability, custom deployment and fine-tuning as a starting point.

If you’d like to be engaged in the private preview program of these features, please share your contact information here.
Get started with Hugging Face inference through Azure AI using these Python samples.

Partner on Azure's latest AI infrastructure with AMD

At Microsoft Build this year, Microsoft unveiled the GA of Azure's latest AI infrastructure offering, the ND MI300X v5, which is powered by AMD Instinct™ MI300X GPUs. Hugging Face is one of the first AI partners to harness this new AI infrastructure and achieved a new benchmark for performance and efficiency of their models in just one month. Through a deep engineering collaboration between Hugging Face, AMD and Microsoft, Hugging Face offers Azure customers the full acceleration capabilities of AMD’s ROCm™ open software ecosystem when using Hugging Face models with its libraries on the new MI300X instances. Hugging Face users and customers of its premium Enterprise Hub service can run over 10,000 pre-trained models on Azure without needing to re-write their applications. Hugging Face Enterprise Hub customers will also be able to quickly and easily deploy and scale their workloads with Azure CycleCloud, Azure Kubernetes Service, and other Azure services.

Phi-3-mini on HuggingChat

We're also broadening the reach of Phi-3, a family of small models developed by Microsoft, by making them available on the HuggingChat playground. HuggingChat allows Phi-3 to meet the community where they’re at and gives developers and data scientists a great place to start experimenting with Phi-3 and discover new ways to leverage the power of small models. The deeper integration with Azure AI will enable developers to combine the power of open platforms and communities on Hugging Face with enterprise-grade offerings on Azure AI.

Visual Studio Code integration with Hugging Face Spaces

Microsoft has been working closely with Hugging Face on the developer experience, and today, we’re introducing a new “Dev Mode” feature for Hugging Face Spaces, designed to streamline the development process for AI developers. Hugging Face Spaces provides a user-friendly platform for creating and deploying AI-powered demos in minutes, with over 500,000 Spaces already created by the Hugging Face community.

With Dev Mode enabled, we made it seamless for developers to connect Visual Studio Code (VS Code) to their Space, eliminating the need to push local changes to the Space repository using git. This integration allows you to edit your code directly within VS Code, whether locally or in the browser, and see your changes in real-time.

For example, if you want to change the color theme of a Gradio Space, you can edit the code in VS Code, and simply click "Refresh" in your Space to see the updates instantly, without the need to push changes or rebuild the Space container.

Once you are satisfied with your changes, you can commit and merge them to persist your updates, making the development process more efficient and user-friendly.