Blog Post

Microsoft Foundry Blog
4 MIN READ

Now in Foundry: Qwen3.5 Medium Model Series

Osi's avatar
Osi
Icon for Microsoft rankMicrosoft
Mar 02, 2026

What's trending in Hugging Face? March 2, 2026

This week's spotlight focuses on the Qwen3.5 Medium Model Series, now available in Microsoft Foundry.  All three models are Vision Language Models (VLMs) built with early-fusion multimodal training, a 262K native context window, and support for 201 languages, released under Apache 2.0. They range from a 27B dense model optimized for latency-sensitive deployments to a 122B sparse Mixture-of-Experts (MoE) model that activates only 10B parameters per inference call, delivering frontier-class multimodal performance at lower inference cost.

Models of the week

What the Qwen3.5 Medium Model Series brings

Before looking at each model individually, three architectural advances apply to all three and are worth understanding:

  • Unified Vision-Language training (early fusion): Rather than attaching a separate vision encoder to a text model as an afterthought, Qwen3.5 trains on text and image tokens together from the beginning. This can enable stronger reasoning over diagrams, charts, and documents compared to prior Qwen3-VL models, which used a separate vision pipeline.
  • Gated Delta Networks: A novel linear attention mechanism that replaces standard self-attention in most transformer layers. Combined with sparse MoE routing in the two larger models, this hybrid can deliver high-throughput inference at lower latency than equivalent dense architectures.
  • Scalable RL across agent environments: Post-training uses reinforcement learning scaled across large multi-agent environments, contributing to strong performance on instruction-following and agentic task benchmarks.

On vision-language reasoning tasks like MMMU and MathVista, these are models small enough to run on local hardware, yet competitive with large, frontier models on multimodal benchmarks.

Figure 1. As seen in the chart above, Qwen 3.5 models have comparable performance to frontier models of larger sizes.

Qwen3.5-27B

Model Specs

  • Parameters / size: 27B (dense)
  • Context length: 262,144 tokens
  • Primary task: Vision Language Model (image-text-to-text)

Why it's interesting (Spotlight)

  • The dense baseline of the family: Unlike its MoE siblings, Qwen3.5-27B activates all 27B parameters on every forward pass. This gives it predictable, consistent latency per token—an important property for real-time applications and latency-sensitive deployments where MoE routing variability is a concern.
  • Instruction-following leader across the family: Scores 95.0 on IFEval, the highest in the family (vs 93.4 for 122B-A10B and 91.9 for 35B-A3B), and 76.5 on IFBench—making it the strongest choice for structured-output tasks, complex multi-step instruction chains, and agent scaffolds that rely on precise format compliance.

 

Try it

You're building a visual quality inspection system for a circuit board manufacturer. Deploy Qwen3.5-27B in Microsoft Foundry to process images captured by a production line camera.

Manufacturing sample prompt: Given an image of a printed circuit board (PCB), identify visible defects such as solder bridges, missing components, or misaligned pads. Return a JSON object with defect type, approximate board location, and severity (low / medium / high). Flag any board containing at least one high-severity defect for immediate rework routing.

Qwen3.5-35B-A3B

Model Specs

  • Parameters / size: 35B total, 3B activated per forward pass (MoE)
  • Context length: 262,144 tokens
  • Primary task: Vision Language Model (image-text-to-text)

Why it's interesting (Spotlight)

  • The throughput-optimized pick: With only 3B parameters active per token despite a 35B parameter pool, this model delivers performance close to much larger dense models at substantially lower inference cost.
  • 256-expert MoE routing at compact scale: Routes each token through 8 of 256 routed experts plus 1 shared expert. This breadth of specialization at a scale that only activates 3B parameters makes the 35B-A3B well-suited for high-throughput serving scenarios where cost per inference matters.

Try it

You're building a contract review assistant for an in-house legal team at a multinational company. Deploy Qwen3.5-35B-A3B in Microsoft Foundry to process scanned contract pages provided as images.

Legal document sample prompt: Given a page from a commercial services agreement, extract all defined terms, identify obligation and liability clauses, and flag any termination conditions that deviate from standard commercial practice. Return a structured summary with clause type, section reference, and a one-sentence plain-language explanation of each flagged item.

Qwen3.5-122B-A10B

Model Specs

  • Parameters / size: 122B total, 10B activated per forward pass (MoE)
  • Context length: 262,144 tokens
  • Primary task: Vision Language Model (image-text-to-text)

 

Why it's interesting (Spotlight)

  • Highest capability in the family: Leads across most benchmarks—76.9 on MMMU-Pro, 83.9 on MMMU, and 86.7 on MMLU-Pro. It also leads the family on SuperGPQA at 67.1 and MMLU-Redux at 94.0, reflecting stronger expert-level knowledge depth.
  • Vision + language reasoning at scale: With the largest routing pool (256 experts, 8 routed + 1 shared) and 10B active parameters, this model handles the most demanding multimodal tasks in the family—long-document analysis over images, multi-step visual reasoning, and complex cross-modal instruction following at extended context lengths.

Try it

You're building an earnings research assistant for an investment team. Deploy Qwen3.5-122B-A10B in Microsoft Foundry to analyze earnings presentation slides submitted as images.

Financial research sample prompt: Given a slide containing a combination of charts, tables, and management commentary, extract key financial metrics (revenue, EBITDA, year-over-year growth), interpret the trend shown in any charts, and generate a two-paragraph analyst summary suitable for a morning briefing. Flag any metrics that deviate materially from prior-quarter guidance and indicate the direction of the deviation.

Getting started

You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation.

 

 

Updated Mar 02, 2026
Version 1.0
No CommentsBe the first to comment