Blog Post

Microsoft Foundry Blog
4 MIN READ

Now in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image

vaidyas's avatar
vaidyas
Icon for Microsoft rankMicrosoft
Feb 09, 2026

What's trending in Hugging Face? Feb 9, 2026

This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains.

This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows.

Models of the week

Qwen: Qwen3-Coder-Next

Model Specs

  • Parameters / size: 80B total (3B activated)
  • Context length: 262,144 tokens
  • Primary task: Text generation (coding agents, tool use)

Why it's interesting

  • Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware
  • Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion
  • Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report)

Try it

Use Case

Prompt Pattern

Code generation with tool use

Provide task context, available tools, and execution environment details

Long-context refactoring

Include full codebase context within 256k window with specific refactoring goals

Autonomous debugging

Present error logs, stack traces, and relevant code with failure recovery instructions

Multi-file code synthesis

Describe architecture requirements and file structure expectations

Financial services sample prompt:

You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage.

Qwen: Qwen3-ASR-1.7B

Model Specs

  • Parameters / size: 1.7B
  • Context length: 256 tokens (default), configurable up to 4096
  • Primary task: Automatic speech recognition (multilingual)

Why it's interesting

  • All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language
  • Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows
  • State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper)
  • Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines

Try it

Use Case

Prompt Pattern

Multilingual transcription

Send audio files via API with automatic language detection

Call center analytics

Process customer service recordings to extract transcripts and identify languages

Content moderation

Transcribe user-generated audio content across multiple languages

Meeting transcription

Convert multilingual meeting recordings to text for documentation

Customer support sample prompt:

Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis.

Tongyi-MAI: Z-Image

Model Specs

  • Parameters / size: 6B
  • Context length: N/A (text-to-image)
  • Primary task: Text-to-image generation

Why it's interesting

  • Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve
  • High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency
  • Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper)

Try it

Use Case

Prompt Pattern

Multilingual transcription

Send audio files via API with automatic language detection

Call center analytics

Process customer service recordings to extract transcripts and identify languages

Content moderation

Transcribe user-generated audio content across multiple languages

Meeting transcription

Convert multilingual meeting recordings to text for documentation

E-commerce sample prompt:

Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows.

Getting started

You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation.

Updated Feb 09, 2026
Version 3.0
No CommentsBe the first to comment