We are happy to announce that the Meta Llama 3.3 70B is now available in the Azure AI Foundry Model Catalog. This milestone is part of our ongoing collaboration with Meta to deliver open generative AI models that combine performance, accessibility, and responsibility. The Meta Llama 3.3 70B, embodies the best of what Meta has to offer and offers performance similar to Meta Llama 405B making it easier for developers to build, experiment, and deploy AI-driven solutions at scale at a lower cost.
What’s New in Meta Llama 3.3 70B?
The latest Llama model focuses on enhancements in reasoning, coding, and instruction following, making it one of the most versatile and advanced open models available. Key features include:
- Improved Outputs: Generate step-by-step reasoning and accurate JSON responses for structured data requirements.
- Expanded Language Support: Multilingual capabilities in eight major languages, including English, French, Hindi, and Thai.
- Enhanced Coding Capabilities: Greater coverage of programming languages, improved error handling, and detailed code feedback.
- Task-Aware Tool Use: Smarter tool invocation that respects predefined parameters and avoids unnecessary calls.
This model ensures developers achieve similar performance to the larger 405B model but at a fraction of the cost, making high-quality generative AI accessible to a broader audience.
A Foundation for Responsible AI
According to Meta, Llama models are designed with responsibility at its core. Meta’s safety protocols ensure the model is not only powerful but also aligned with ethical AI standards. Features like Llama Guard 3 and Prompt Guard offer developers built-in safeguards to prevent misuse, making Llama an ideal choice for safe AI deployment.
Scaling Innovation with Azure AI Foundry
The integration of Llama 3.3 70B into the Azure AI Foundry Model Catalog further strengthens Azure's position as a leader in AI innovation. Developers using Azure now have:
- Unmatched Accessibility: Llama models are readily available through Azure AI Foundry, simplifying adoption for developers worldwide.
- Seamless Integration: Developers can integrate Llama into their existing workflows, leveraging tools like GitHub and Visual Studio Code for a frictionless experience.
- Unparalleled Support: Backed by Azure’s infrastructure, the Llama models offer scalability, security, and reliability for enterprises and startups alike.
Driving Real-World Impact
Llama models have already demonstrated their potential to transform industries:
- Education: Empowering students and educators with multilingual AI assistants.
- Software Development: Enhancing productivity with accurate coding assistance.
- Enterprise Applications: Streamlining customer support, data analysis, and content generation.
As the global community continues to adopt generative AI, the Meta Llama 3.3 70B model will play a pivotal role in unlocking new possibilities while ensuring safety and inclusivity.
Getting Started with Llama 3.3 70B on MaaS
To get started with Azure AI Foundry and deploy your first model, follow these clear steps:
- Familiarize Yourself: If you're new to Azure AI Foundry, start by reviewing this documentation to understand the basics and set up your first project.
- Access the Model Catalog: Open the model catalog in AI Foundry.
- Find the Model: Use the filter to select the Meta collection or click the “View models” button on the MaaS announcement card.
- Select the Model: Open the Llama-3.1 model from the list.
- Deploy the Model: Click on ‘Deploy’ and choose the managed compute option
Try Llama 3.3 on Azure AI Foundry Today
With Llama 3.3 70B now live on Azure AI Foundry, it’s easier than ever to bring your AI ideas to life. Whether you’re a developer, researcher, or enterprise innovator, the Llama ecosystem offers the tools and resources you need to succeed.
Explore Llama 3.3 today on Azure AI Foundry and experience the future of open, responsible, and scalable AI. Together with Meta, Microsoft is shaping the next chapter of generative AI innovation. Let’s build the future—responsibly.
FAQ
- What does it cost to use Llama 3.3 models on Azure?
- For managed compute deployments, you’ll be billed based on the minimum GPU SKU used for deployment, provided you have sufficient GPU quota.
- For models via MaaS, you’ll be billed based on the prompt and completion tokens. Pricing will be available soon, seen in Azure AI Foundry (Marketplace Offer details tab when deploying the model) and Azure Marketplace.
- Do I need GPU capacity in my Azure subscription to use Llama 3.3 models?
- Yes, for models available via managed compute deployment, you will need GPU capacity by model.
- When you deploy the model, you’ll see the VM that is automatically selected for deployment.
- For the 11B Vision Instruct and 90B Vision Instruct available via serverless API (coming soon), no GPU capacity is required.
 
- Given that Llama 3.3 will be billed through the Azure Marketplace, would it retire my Azure consumption commitment (aka MACC) when these models are available via MaaS?
- Yes, Llama 3.3 70B models will be “Azure benefit eligible” Marketplace offer, indicating MACC eligibility. Learn more about MACC here: https://learn.microsoft.com/en-us/marketplace/azure-consumption-commitment-benefit
- Is my inference data shared with Meta?
- No, Microsoft does not share the content of any inference request or response data with Meta.
- Are there rate limits for the Meta models on Azure?
- Meta models come with 200k tokens per minute and 1k requests per minute limit. Reach out to Azure customer support if this doesn’t suffice.
- Can I use MaaS models in any Azure subscription types?
- Customers can use MaaS models in all Azure subscription types with a valid payment method, except for the CSP (Cloud Solution Provider) program. Free or trial Azure subscriptions are not supported.
- Can I fine-tune Llama 3.3 models?
- Fine-tuning for this model is coming soon.
Team MSFT - Defer to you all on how these should be updated to reflect the new model number.