Blog Post

Azure AI Foundry Blog
2 MIN READ

Save Big on Hosting Your Fine-Tuned Models on Azure OpenAI Service

AliciaFrame's avatar
AliciaFrame
Icon for Microsoft rankMicrosoft
Jul 18, 2024

We've heard your feedback loud and clear: folks want to fine tune their models, but the pricing can make experimentation too expensive. Following our update last month to switch to token based billing for training, we're reducing the hosting charges for many of your favorite models! 

 

Starting from July 1, we have reduced the hosting charges for many Azure OpenAI Service fine-tuned models, including our most popular models - the GPT-35-Turbo family. For folks less familiar with our service, models need to be deployed before they can be used for inferencing – and when deployed, we charge an hourly rate for hosting models. Don't need to use your model right away? We store up to 100 non-deployed fine tuned models per resource, for free! 

 

The new prices are published on the Azure OpenAI Service Pricing page, and listed below: 

Base Model 

Previous Price 

New Price 
(Effective July 1, 2024) 

Babbage-002 

$1.70 / hour 

$1.70 / hour 

Davinci-002 

$2.00 / hour 

$1.70 / hour (15% off) 

GPT-35-Turbo (4K) 

$3.00 / hour 

$1.70 / hour (43% off) 

GPT-35-Turbo (16K) 

$3.00 / hour 

$1.70 / hour (43% off) 

 

Why do we charge for hosting? When you deploy a fine tune model, you're covered by the same Azure OpenAI SLAs as our base models, with 99.9% uptime, and hosted continuously on Azure infrastructure rather than being loaded on demand. This means that once your model is deployed, there's no wait for inferencing. And, because you're paying for your deployment, we charge a relatively low price for inferencing (the same as the equivalent base model). 

When comparing different services, you can consider the tradeoff between a fixed price for hosting and a higher per-token rate for inferencing. Because Azure OpenAI has a fixed hosting cost and low inferencing charges, for heavier inferencing workloads it may be much cheaper compared to services that just charge a premium on tokens. For example, if we assume a standard 8:1 ratio for input to output tokens and compare the costs of using a fine-tuned GPT-35-Turbo model, when your workload surpasses ~700K tokens / hour (~12K TPM), Azure OpenAI becomes the cheaper option. 

 

We hope this will make it easier for you to use these models and explore their capabilities. Thank you for choosing Azure OpenAI Service. Happy fine tuning! 

Updated Jul 18, 2024
Version 1.0

1 Comment

  • andrewwcbrown's avatar
    andrewwcbrown
    Copper Contributor

    So today, I was recording an educational video to demonstrate fine-tuning for the AI-102, but I was afraid it was too expensive ($34 / hour). These new price changes are great, so I can now go back and properly record a fine-tuning video on Azure AI Studio!

    I don't fully understand why newer models are not fine-tunable. It's currently my guess that it is most cost-effective to fine-tune these "completion models", and maybe newer models like GPT4 would be too costly.

    I wonder how useful you can make these older models, because I imagine with fine-tuning you can make these models a lot more useful but cost-effective but I have to devise a test case.