Introducing the Azure AI Model Inference API

Microsoft

May 21, 2024

We launched the model catalog in early 2023, featuring a curated selection of open-source models that customers can trust and consume in their organizations. The Azure AI model catalog offers around ~1700 models, including the latest open-source innovations like Llama3 from Meta, but also models coming from partnerships like OpenAI, Mistral, and Cohere. Each of these models with unique capabilities that we think will inspire developers to build the next generation of copilots.

A screenshot of the Azure AI model catalog displaying the large diversity of models it brings in for customers.

To enable developers to get access to these capabilities consistently, we are launching the Azure AI model inference API, which enables customers to consume the capabilities of those models using the same syntax and the same language. This API introduces a single layer of abstraction, yet it allows each model to expose unique features or capabilities that differentiate them.

Starting today, all language models deployed as serverless API support this common API. This means you can interact with GPT-4 from Azure OpenAI Service, Cohere Command R+, or Mistral-Large, in the same way without the need for translations. These capabilities are also be available on a subset of models deployed to our self-hosted managed endpoints, unifying the consumption experience across all our inferencing solutions.

A graphic depicting that the Azure AI model inference API can be used to consume models from Cohere, Mistral, Meta LLama, Microsoft (including Phi-3) and Core42 JAIS, and it’s also compatible with Azure OpenAI Service model deployments.

This is the same API utilized within Azure AI Studio and Azure Machine Learning. You can use prompt flow to build intelligent experiences that can now leverage various models. Since all the models speak the same language, you can run evaluations to compare them across different tasks, determine which one to use for each use case, exploit their strengths, and build experiences that delight your customers.

A screenshot showing the comparison of 3 different evaluations of a prompt flow chat application that implements the RAG pattern. The evaluation was run using 3 different variations of the same prompt flow, each of them running GPT-3.5 Turbo, Mistral-Large, and Llama2-70B-chat, using the same prompt message for the generation step.

We see more customers eager to combine the innovation from across the industry and redefine what’s possible. They are either integrating foundational models as building blocks for their applications or by fine-tuning them to achieve niche capabilities in specific use cases. We hope these new set of capabilities unlock the experimentation and evaluation required to move across models, picking the right one for the right job.

We want to help customers to fulfill that mission, empowering every single AI developer to achieve more with Azure AI.

To lern more about the Azure AI model catalog and how the Azure AI model inference API provides a simplified access to it, see the following //build 2024 breakout session: