Blog Post

Azure AI Foundry Blog
2 MIN READ

Cohere Models Now Available on Managed Compute in Azure AI Foundry Models

truptiparkar's avatar
truptiparkar
Icon for Microsoft rankMicrosoft
Jun 13, 2025

Over the course of the last year, we have launched several Cohere models on Azure as Serverless Standard (pay-go) offering. We’re excited to announce that Cohere's latest models—Command A, Rerank 3.5, and Embed 4—are now available on Azure AI Foundry models via Managed Compute.  

This launch allows enterprises and developers to now deploy Cohere models instantly with their own Azure quota, with per-hour GPU pricing that compensates the model provider—unlocking a scalable, low-friction path to production-ready GenAI. 

 

What is Managed Compute? 

Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure. 

 

Why Use Managed Compute? 

Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you: 

  • Custom model support: Deploy open-source or third-party models 
  • Infrastructure flexibility: Choose your own GPU SKUs (A10, A100, H100) 
  • Detailed control: Configure inference servers, protocols, and advanced settings 
  • Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs 
  • Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies 

 

Cohere Models Now Available on Managed Compute

Command A 

  • Use case: Advanced generation, reasoning, agentic frameworks 
  • Pricing: $17.125 / GPU / hour 

Rerank 3.5 

  • Use case: Retrieval-Augmented Generation (RAG), semantic search, ranking 
  • Pricing: $3.50 / instance / hour 

Embed 4 

  • Use case: Text embeddings for vector search, clustering, classification 
  • Pricing: $2.94 / instance / hour 

 

Why This Matters 

This is a big step forward for the model ecosystem. With managed compute, Azure makes it easy to: 

  • Access and pay for top-tier models like Cohere by bringing your own compute 
  • Support model builders by compensating them for usage 
  • Deploy production GenAI apps without infrastructure overhead 
  • Choose performance—A10, A100, and H100-backed SKUs for latency-sensitive use cases 

 

Get Started 

You can find these models in Azure AI Foundry Models. Just select your model, choose a deployment target, and launch with confidence—usage-based billing is already built in. 

With Cohere’s models now on managed compute building GenAI apps using foundation models has never been faster, easier, or more enterprise-ready. 

  • Cohere provides the model weights 
  • Azure hosts the model on managed VMs (A10/A100/H100 GPUs) 
  • Customers deploy and pay-per-hour, with usage automatically compensating Cohere 

 

Updated Jun 16, 2025
Version 3.0
No CommentsBe the first to comment