State of the art open models, served through Fireworks AI's optimized deployments now in Microsoft Foundry
We’re excited to announce that starting today, Microsoft Foundry customers can access high performance, low latency inference performance of popular open models hosted on the Fireworks cloud from their Foundry projects, and even deploy their own customized versions, too!
As part of the Public Preview launch, we’re offering the most popular open models for serverless inference in both pay-per-token (US Data Zone) and provisioned throughput (Global Provisioned Managed) deployments. This includes:
- Minimax M2.5 🆕
- OpenAI’s gpt-oss-120b
- MoonshotAI’s Kimi-K2.5
- DeepSeek-v3.2
For customers that have been looking for a path to production with models they’ve post-trained, you can now import your own fine-tuned versions of popular open models and deploy them at production scale with Fireworks AI on Microsoft Foundry.
The Microsoft Foundry model catalog showing the new Fireworks on Foundry model collection.
Serverless (pay-per-token)
For customers wanting per-token pricing, we’re launching with Data Zone Standard in the United States. You can make model deployments for Foundry resources in the following regions:
- East US
- East US 2
- Central US
- North Central US
- West US
- West US 3
Depending on your Azure subscription type, you’ll automatically receive either a 250K or 25K tokens per minute (TPM) quota limit per region and model. (Azure Student and Trial subscriptions will not receive quota at this time.)
Per-token pricing rates include input, cached input, and output tokens priced per million tokens.
|
Model |
Input Tokens |
Cached Tokens |
Output Tokens |
|
gpt-oss-120b |
$0.17 |
$0.09 |
$0.66 |
|
kimi-k2.5 |
$0.66 |
$0.11 |
$3.30 |
|
deepseek-v3.2 |
$0.62 |
$0.31 |
$1.85 |
|
minimax-m2.5 |
$0.33 |
$0.03 |
$1.32 |
As we work together with Fireworks to launch the latest OSS models, the supported models will evolve as popular research labs push the frontier!
Provisioned Throughput
For customers looking to shift or scale production workloads on these models, we’re launching with support for Global provisioned throughput. (Data Zone support will be coming soon!)
Provisioned throughput for Fireworks models works just like it does for Foundry models: PTUs are designed to deliver consistent performance in terms of time between token latency. Your existing quota for Global PTUs works as does any reservation commitments!
|
|
gpt-oss-120b |
Kimi-K2.5 |
DeepSeek-v3.2 |
MiniMax-M2.5 |
|
Global provisioned minimum deployment |
80 |
800 |
1,200 |
400 |
|
Global provisioned scale increment |
40 |
400 |
600 |
200 |
|
Input TPM per PTU |
13,500 |
530 |
1,500 |
3,000 |
|
Latency Target Value |
99% > 50 Tokens Per Second^ |
99% > 50 Tokens Per Second^ |
99% > 50 Tokens Per Second^ |
99% > 50 Tokens Per Second^ |
^ Calculated as p50 request latency on a per 5 minute basis.
Custom Models
Have you post-trained a model like gpt-oss-120b for your particular use case? With Fireworks on Foundry you can deploy, govern, and scale your custom models all within your Foundry project. This means full fine-tuned versions of models from the following families can be imported and deployed as part of preview:
- Qwen3-14B
- OpenAI gpt-oss-120b
- Kimi K2 and K2.5
- DeepSeek v3.1 and v3.2
The new Custom Models page in the Models experience lets you initiate the import process for copying your model weights into your Foundry project.
Importing Custom Models into Microsoft Foundry is available under Build -> Models -> Custom Models.
For performing a high-speed transfer of the files into Foundry, we’ve added a new feature to Azure Developer CLI (azd) for facilitating the transfer of a directory of model weights. The Foundry UI will give you cli arguments to copy and paste for quickly running azd ai models create pointed to your Foundry project.
Enabling Fireworks AI on Microsoft Foundry in your Subscription
While in preview, customers must opt-in to integrate their Microsoft Foundry resources with the Fireworks inference cloud to perform model deployments and send inference requests. Opt-in is self-service and available in the Preview features panel within your Azure portal.
The Azure Preview features panel for an Azure subscription where you can enable the Fireworks on Foundry experience.
For additional details on finding and enabling the preview feature, please see the new product documentation for Fireworks on Foundry.
Frequently Asked Questions
How are Fireworks AI on Microsoft Foundry models different than Foundry Models?
Models provided direct from Azure include some open-source models as well as proprietary models from labs like Black Forest Labs, Cohere, and xAI, and others. These models undergo rigorous model safety and risks assessments based on Microsoft’s Responsible AI standard.
For customers needing the latest open-source models from emerging frontier labs, break-neck speed, or the ability to deploy their own post-trained custom models, Fireworks delivers best-in-class inference performance. Whether you’re focused on minimizing latency or just staying ahead of the trends, Fireworks AI on Microsoft Foundry gives you additional choice in the model catalog.
Still need to quantify model safety and risk? Foundry provides a suite of observability tools with built-in risk and safety evaluators, letting you build AI systems confidently.
How is model retirement handled?
Customers using serverless per-token offers of models via Fireworks on Foundry will receive notice no less than 30 days before potential model retirement. You’ll be recommended to upgrade to either an equivalent, longer-term supported Azure Direct model or a newer model provided by Fireworks.
For customers looking to use models beyond the retirement period, they may do so via Provisioned throughput deployments.
How can I get more quota?
For TPM quota, you may submit requests via our current Fireworks on Foundry quota form.
For PTU quota, please contact your Microsoft account team.
Can you support my custom model?
Let’s talk! In general, if your model meets Fireworks’ current requirements, we have you covered. You can either reach out to your Microsoft account team or your contacts you may already have with Fireworks.