Cohere Models Now Available on Managed Compute in Azure AI Foundry Models

Microsoft

Jun 13, 2025

Over the course of the last year, we have launched several Cohere models on Azure as Serverless Standard (pay-go) offering. We’re excited to announce that Cohere's latest models—Command A, Rerank 3.5, and Embed 4—are now available on Azure AI Foundry models via Managed Compute.

This launch allows enterprises and developers to now deploy Cohere models instantly with their own Azure quota, with per-hour GPU pricing that compensates the model provider—unlocking a scalable, low-friction path to production-ready GenAI.

What is Managed Compute?

Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure.

Why Use Managed Compute?

Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you:

Custom model support: Deploy open-source or third-party models

Infrastructure flexibility: Choose your own GPU SKUs (A10, A100, H100)

Detailed control: Configure inference servers, protocols, and advanced settings

Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs

Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies

Cohere Models Now Available on Managed Compute

Command A

Use case: Advanced generation, reasoning, agentic frameworks

Pricing: $17.125 / GPU / hour

Rerank 3.5

Use case: Retrieval-Augmented Generation (RAG), semantic search, ranking

Pricing: $3.50 / instance / hour

Embed 4

Use case: Text embeddings for vector search, clustering, classification

Pricing: $2.94 / instance / hour

Why This Matters

This is a big step forward for the model ecosystem. With managed compute, Azure makes it easy to:

Access and pay for top-tier models like Cohere by bringing your own compute

Support model builders by compensating them for usage

Deploy production GenAI apps without infrastructure overhead

Choose performance—A10, A100, and H100-backed SKUs for latency-sensitive use cases

Get Started

You can find these models in Azure AI Foundry Models. Just select your model, choose a deployment target, and launch with confidence—usage-based billing is already built in.

With Cohere’s models now on managed compute building GenAI apps using foundation models has never been faster, easier, or more enterprise-ready.