Blog Post

Azure AI Foundry Blog
2 MIN READ

Expanding the Llama 4 Herd: New Models Now Available on Azure AI Foundry

Naomi Moneypenny's avatar
Apr 14, 2025

Releasing as Serverless offering today, Llama 4 Scout and Maverick Instruct models. Also available on GitHub Models.

Last week, we kicked off the arrival of Meta’s powerful new Llama 4 models in Azure with the launch of three models across Azure AI Foundry and Azure Databricks. Today, we’re expanding the herd with the addition of two new 17B-parameter instruction-tuned models — now available in the Azure AI Foundry model catalog as Models as a Service (MaaS) endpoints

New Models

  1. Llama-4-Scout-17B-16E-Instruct
    A fast, low-latency 17B model with 16 experts — optimized for general-purpose tasks with strong instruction following.
  2. Llama-4-Maverick-17B-128E-Instruct-FP8
    A larger, more expressive variant with 128 experts and FP8 precision — built for heavier, higher-quality reasoning under constrained compute.

Both models are:

  • Hosted as serverless MaaS endpoints — no infrastructure setup required
  • Available on GitHub Models and playground

What Makes These Llama 4 Models Special?

These models are part of Meta’s mixture-of-experts (MoE) family of Llama 4 variants. Unlike dense models, these MoE architectures selectively activate a subset of model parameters (experts) per token, yielding improved efficiency without sacrificing output quality.

  • Scout-17B-16E offers fast inference for common enterprise workloads like summarization, Q&A, and structured output tasks.
  • Maverick-17B-128E-FP8 introduces aggressive expert scaling and FP8 precision, enabling high-throughput inference with improved energy efficiency.

How to Get Started

You can find these models in the Azure AI Foundry model catalog — just search for "Llama-4" or navigate to the Meta model family. With a few clicks, you can:

  • Deploy the model as a serverless endpoint
  • Invoke it via the Azure AI Foundry playground
  • Integrate using the Azure OpenAI-compatible REST API or Python SDK

Use Cases

These 17B models are a great fit for:

  • Knowledge assistant copilots
  • Long-form summarization
  • Table-to-text transformation
  • Conversational agents
  • Internal developer tools

Explore More

To learn more about last week’s launch of Llama 4 models, including 8B and 70B variants, check out the official Azure blog:  Introducing the Llama 4 Herd in Azure AI Foundry and Azure Databricks

Try these models today in Azure AI Foundry and let us know what you build!

 

Updated Apr 14, 2025
Version 1.0
No CommentsBe the first to comment