Blog Post

Microsoft Foundry Blog
6 MIN READ

Now in Foundry: Cohere Transcribe, Nanbeige 4.1-3B, and Octen Embedding

Osi's avatar
Osi
Icon for Microsoft rankMicrosoft
Apr 06, 2026

What's trending in Hugging Face? April 6, 2026

This week's Model Mondays edition spans three distinct layers of the AI application stack: Cohere's cohere-transcribe, a 2B Automatic Speech Recognition (ASR) model that ranks first on the Open ASR Leaderboard across 14 languages; Nanbeige's Nanbeige4.1-3B, a compact 3B reasoning model that outperforms models ten times its size on coding, math, and deep-search benchmarks; and Octen's Octen-Embedding-0.6B, a lightweight text embedding model that achieves strong retrieval scores across 100+ languages and industry-specific domains.

Together, these three models illustrate how developers can build full AI pipelines—from audio ingestion to language reasoning to semantic retrieval—entirely with open-source models deployed through Microsoft Foundry. Each operates in a different modality and fills a distinct architectural role, making this week's selection especially well-suited for teams assembling production-grade systems across speech, text, and search.

Models of the week

Cohere's cohere-transcribe-03-2026

Model Specs

  • Parameters / size: 2B
  • Primary task: Automatic Speech Recognition (audio-to-text)

Why it's interesting

  • Top-ranked on the Open ASR Leaderboard: cohere-transcribe-03-2026 achieves a 5.42% average Word Error Rate (WER) across 8 English benchmark datasets as of March 26, 2026—placing it first among open models. It reaches 1.25% WER on LibriSpeech Clean and 8.15% on AMI (meeting transcription), demonstrating consistent accuracy across both clean speech and real-world, multi-speaker environments. Benchmarks: Open ASR Leaderboard.
  • 14 languages with a dedicated encoder-decoder architecture: The model uses a large Conformer encoder for acoustic representation extraction paired with a lightweight Transformer decoder for token generation, trained from scratch on 14 languages covering European, East Asian (Chinese Mandarin, Japanese, Korean, Vietnamese), and Arabic. Unlike general-purpose models adapted for ASR, this dedicated architecture makes it efficient without sacrificing accuracy.
  • Long-form audio with automatic chunking: Audio longer than 35 seconds is automatically split into overlapping chunks and reassembled into a coherent transcript—no manual preprocessing required. Batched inference, punctuation control, and per-language configuration are all supported through the standard API.

Try it

Click on the window above, upload an audio file, and watch how quickly the model transcribes it for you. Or click the link to experiment with the Cohere Transcribe Space and record audio directly from your device.

Use Case

Prompt Pattern

Meeting transcription

Submit recorded audio with language tag; retrieve timestamped transcript per speaker turn

Call center quality review

Batch-process customer call recordings, extract transcript, pass to classification model

Medical documentation

Transcribe clinical encounters; feed transcript into summarization or structured note pipeline

Multilingual content indexing

Process podcasts or video audio in any of 14 supported languages; store as searchable text

Sample prompt for a legal services deployment:

You are building a contract negotiation assistant. A client submits a recorded audio of a 45-minute supplier negotiation call. Using the cohere-transcribe-03-2026 endpoint deployed in Microsoft Foundry, transcribe the call with punctuation enabled for the English audio. Once the transcript is available, pass it to a downstream language model with the following instruction: "Identify all pricing commitments, delivery deadlines, and liability clauses mentioned in this negotiation transcript. For each, note the speaker's position (client or supplier) and flag any terms that appear ambiguous or require legal review."

Nanbeige's Nanbeige4.1-3B

Model Specs

  • Parameters / size: 3B
  • Context length: 131,072 tokens
  • Primary task: Text generation (reasoning, coding, tool use, deep search)

Why it's interesting

  • Reasoning performance that exceeds its size class: Nanbeige4.1-3B scores 76.9 on LiveCodeBench-V6, these results suggest that targeted post-training using Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on a focused dataset can yield improvements that scale-based approaches cannot replicate at equivalent parameter counts. Read the technical report: https://huggingface.co/papers/2602.13367.
  • Strong preference alignment at the 3B scale: On Arena-Hard-v2, Nanbeige4.1-3B scores 73.2, compared to 56.0 for Qwen3-32B and 60.2 for Qwen3-30B-A3B—both significantly larger models. This indicates that the model's outputs consistently match human preference for response quality and helpfulness, not just accuracy on structured tasks.
  • Deep-search capability previously absent from small general models: On xBench-DeepSearch-2505, Nanbeige4.1-3B scores 75—matching search-specialized small agents. The model can sustain complex agentic tasks involving more than 500 sequential tool invocations, a capability gap that previously required either specialized search agents or significantly larger models.
  • Native tool-use support: The model's chat template and generation pipeline natively support tool call formatting, making it straightforward to connect to external APIs and build multi-step agentic workflows without additional scaffolding.

Try it

Use Case

Prompt Pattern

Code review and fix

Provide failing test + stack trace; ask model to diagnose root cause and write corrected implementation

Competition-style math

Submit problem as structured prompt; use temperature 0.6, top-p 0.95 for consistent reasoning steps

Agentic task execution

Provide tool definitions as JSON + goal; let model plan and execute tool calls sequentially

Long-document Q&A

Pass full document (up to 131K tokens) with targeted factual questions; extract structured answers

Sample prompt for a software engineering deployment:

You are automating pull request review for a backend engineering team. Using the Nanbeige4.1-3B endpoint deployed in Microsoft Foundry, provide the model with a unified diff of a proposed code change and the following system instruction: "You are a senior software engineer reviewing a pull request. For each modified function: (1) summarize what the change does, (2) identify any edge cases that are not handled, (3) flag any security or performance regressions relative to the original, and (4) suggest a specific improvement if one is warranted. Format your output as a structured list per function."

Octen's Octen-Embedding-0.6B

Model Specs

  • Parameters / size: 0.6B
  • Context length: 32,768 tokens
  • Primary task: Text embeddings (semantic search, retrieval, similarity)

Why it's interesting

  • Retrieval performance above larger proprietary models at 0.6B: On the RTEB (Retrieval Text Embedding Benchmark) public leaderboard, Octen-Embedding-0.6B achieves a mean task score of 0.7241—above voyage-3.5 (0.7139), Cohere-embed-v4.0 (0.6534), and text-embedding-3-large (0.6110), despite being a fraction of their parameter count. The model is fine-tuned from Qwen3-Embedding-0.6B via Low-Rank Adaptation (LoRA), demonstrating that targeted fine-tuning on retrieval-specific data can close the gap with larger embedding models. 
  • Vertical domain coverage across legal, finance, healthcare, and code: Octen-Embedding-0.6B was trained with explicit coverage of domain-specific retrieval scenarios—legal document matching, financial report Q&A, clinical dialogue retrieval, and code search including SQL. This makes it suitable for regulated-industry applications where generic embedding models tend to underperform on specialized terminology.
  • 32,768-token context for long-document retrieval: The extended context window supports encoding entire legal contracts, earnings reports, or clinical case notes as single embeddings—removing the need to chunk long documents and re-aggregate scores at query time, which can introduce ranking errors.
  • 100+ language support with cross-lingual retrieval: The model handles multilingual and cross-lingual retrieval natively, with strong coverage across languages including English, Chinese, and other major languages via its Qwen3-based architecture—practical for global enterprise applications that span multiple languages.
  •  

Use Case

Prompt Pattern

Semantic search

Encode user query and document corpus; rank documents by cosine similarity to query embedding

Legal precedent retrieval

Embed case briefs and query with legal question; retrieve most semantically relevant precedents

Cross-lingual document search

Encode multilingual document set; submit query in any supported language for cross-lingual retrieval

Financial Q&A pipeline

Embed earnings reports or filings; retrieve relevant passages to ground downstream language model responses

Sample prompt for a global enterprise knowledge base deployment:

You are building a clinical decision support tool. Using the Octen-Embedding-0.6B endpoint deployed in Microsoft Foundry, embed a corpus of 10,000 clinical case notes at ingestion time and store the resulting 1024-dimensional vectors in a vector database. At query time, encode an incoming patient presentation summary and retrieve the 5 most semantically similar historical cases. Pass the retrieved cases and the current presentation to a language model with the following instruction: "Based on these five similar cases and their documented outcomes, summarize the most common treatment approaches and flag any cases where the outcome differed significantly from the initial prognosis."

Getting started

You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation:

Updated Apr 06, 2026
Version 2.0
No CommentsBe the first to comment