artificial intelligence
182 TopicsGrok 4.0 Goes GA in Microsoft Foundry and Grok 4.1 Fast Arrives with Major Enhancements
We first brought Grok 4.0 to Microsoft Foundry in September 2025, marking an important milestone in expanding Foundry’s multi-model ecosystem with frontier reasoning models from xAI. Since then, customer interest and usage have continued to build as developers explored Grok’s strengths in fast reasoning, sense-making, and interpretation of complex, ambiguous information. Today, we’re excited to announce that Grok 4.0 is now generally available (GA) in Microsoft Foundry, giving enterprises a production-ready path to deploy Grok at scale. Building on that momentum, Grok 4.1 Fast Non Reasoning , is now available in Microsoft Foundry and Grok 4.1 Fast Reasoning is coming soon. Grok 4.1 introduces a suite of improvements that enhance conversation quality, creativity, and emotional intelligence while maintaining core reasoning strengths. According to xAI, Grok 4.1 delivers more natural, fluid dialogue compared with earlier versions. Introducing Grok 4.1 Fast (Reasoning and Non-Reasoning) Grok 4.1 Fast is optimized for speed, scale, and agentic execution, giving developers flexibility to choose between reasoning and non-reasoning variants depending on workload requirements. Grok 4.1 Fast (Reasoning): Designed for scenarios that require rapid multi-step reasoning, structured decision-making, and interpretation of complex inputs. This variant is well-suited for agent workflows, analysis pipelines, and applications that need fast responses without sacrificing reasoning depth. Grok 4.1 Fast (Non-Reasoning): Optimized for maximum throughput and low latency, this variant is ideal for tasks such as summarization, classification, content transformation, and tool-driven execution where deterministic speed and efficiency matter more than deep reasoning. Together, these options allow teams to right-size performance and cost by selecting the appropriate Grok 4.1 Fast variant for each stage of an application from high-volume preprocessing and orchestration to targeted reasoning tasks. What’s New with Grok 4.1 Fast? Grok 4.1 brings several enhancements that broaden the model’s applicability and user experience: Improved Conversational Quality: According to xAI, Grok 4.1 Fast offers smoother, more natural interaction patterns, making it more comfortable and intuitive to engage with, especially in multi-turn dialogues. Enhanced Creativity and Emotional Awareness: According to xAI, Grok 4.1 Fast demonstrates stronger creative writing capabilities and greater emotional intelligence, helping it generate more expressive and engaging outputs that better align with human expectations. Reduced Hallucination and Better Reliability: According to xAI, Grok 4.1 Fast produces fewer factual inaccuracies than its predecessor These enhancements make Grok 4.1 Fast a compelling choice for use cases that require engaging conversational interfaces, creative support, and rich natural language interaction. As with all frontier AI models, Grok-4.1 Fast introduces new capabilities alongside new operational considerations. Microsoft’s safety and responsible AI evaluations indicate that Grok-4.1 Fast may demonstrate increased risks in safety testing compared with other models available through Azure. In practice, this means there may be an increased risk of generating explicit or potentially harmful content. To support responsible deployment, customers should implement system-level safety instructions and leverage Azure AI Content Safety (AACS) to help monitor and filter outputs. Because no single safety system can address every possible risk scenario, customers are encouraged to conduct their own evaluations and validation before deploying Grok-4.1 in production systems. To provide enhanced safety and enterprise reliability, Microsoft's deployment of Grok 4.1 features a system-applied safety prompt that cannot be disabled. Customers are expected to operate the model without attempting to bypass or interfere with this feature. Enterprise-Ready Deployment via Microsoft Foundry With Grok 4.0 now GA in Foundry, enterprises gain the ability to incorporate advanced reasoning models into their workflows while enjoying the governance, compliance, and operational tooling that Azure provides. Models hosted in Foundry can be deployed serverless or with provisioned throughput, and customers benefit from centralized billing, identity integration, and access to other Azure services. Foundry’s model catalog also includes other Grok variants such as Grok 4.1 Fast and related non-reasoning SKUs, giving enterprises flexibility to balance performance, latency, and cost depending on their workloads. Pricing Model Deployment Input/1M Tokens Output/1M Tokens Availability Grok 4.1 Fast (Non-Reasoning) Global Standard $0.2 $0.5 Public Preview on 2/27/2026 Grok 4.1 Fast (Reasoning) Global Standard $0.2 $0.5 Coming Soon Looking Ahead The combination of Grok’s deep reasoning capabilities with the enterprise readiness of Microsoft Foundry opens new possibilities for production AI applications, from complex analytical agents and research assistants to creative and customer-facing experiences. With Grok 4.1’s conversational refinements further raising the model’s usability and expressiveness, Foundry customers can now experiment with and scale a broader set of AI-driven solutions, all within a trusted, governed environment. As Microsoft continues to expand Foundry’s catalog and partners like xAI continue to innovate, organizations have more options than ever to power next-generation AI applications across industries, use cases, and domains. Try Grok 4.1 Non-Reasoning <AI Model Catalog | Microsoft Foundry Models>237Views0likes0CommentsAnnouncing extended support for Fine Tuning gpt-4o and gpt-4o-mini
At Build 2025, we announced post-retirement, extended deployment and inference support for fine tuned models. Today, we’re excited to announce we’re extending fine-tuning training for current customers of our most popular Azure OpenAI models: gpt-4o (2024-08-06) and gpt-4o-mini (2024-07-18). Hundreds of customers have pushed trillions of tokens through fine-tuned versions of these models and we’re happy to provide even more runway for your AI agents and applications. Already using these models in Foundry? We have you covered as the only provider of fine tuning gpt-4o and gpt-4o-mini come April. Keep fine tuning! Not yet using Microsoft Foundry? Get started today by migrating your training data to Microsoft Foundry and fine tune using Global or Standard Training for gpt-4o and gpt-4o-mini using your existing OpenAI code. You’ll have the runway to continuously fine tune or update your models. You have until March 31 st , 2026, to become a fine-tuning customer of these models. Model Version Training retirement date Deployment retirement date gpt-4o 2024-08-06 No earlier than 2026-09-31 1 2027-03-31 gpt-4o-mini 2024-07-18 No earlier than 2026-09-31 1 2027-03-31 gpt-4.1 2025-04-14 At base model retirement One year after training retirement gpt-4.1-mini 2025-04-14 At base model retirement One year after training retirement gpt-4.1-nano 2025-04-14 At base model retirement One year after training retirement o4-mini 2025-04-16 At base model retirement One year after training retirement 1 For existing customers only. Otherwise, training retirement occurs at base model retirement160Views0likes0CommentsMeet FLUX.2 [flex] for Text‑Heavy Design and UI Prototyping Now Available on Microsoft Foundry
Within the FLUX.2 family, FLUX.2 [flex] is the specialist for text and graphic design production, while FLUX.2 [pro] is optimized for photorealistic and cinematic-quality outputs. Use FLUX.2 [flex] when typography, logos, UI copy, or on-brand visual accuracy is the priority. Model Overview FLUX.2 [flex] delivers fine-grained control for precise results. It is purpose-built for typography and text-forward workflows, maintaining small details in complex scenes while enabling rapid iteration at scale. Teams use it for production-grade brand design, product UI prototyping, and any workflow where accurate text rendering is critical. This positioning makes FLUX.2 [flex] ideal for platform builders and creative teams who need to balance speed, scale, and cost while maintaining creative control. FLUX.2 [flex] is now available as gated Public Preview. Fill out this form Black Forest Lab's Flux models Limited Access Applications to get access to these models today. Key Capabilities Best-in-class text rendering for logos, UI copy, product packaging, and social graphics Fine-grained control - Adjust inference steps and guidance scale to dial in precise outputs for production workflows Detail preservation - Maintains fine details and small elements in complex scenes, keeping brand assets sharp at any resolution Multi-reference editing - Supports up to 8 reference images via API for complex multi-image compositing workflows Use Cases FLUX.2 [flex] is specialized for workflows where accurate text rendering and fine-grained visual control are non-negotiable. It excels across brand design, product UI prototyping, social graphics, and any project requiring readable typography, precise layouts, or clean on-brand assets. Brand and Graphic Design Generate precise, on-brand graphic design assets with clean, readable typography across logos, packaging, and marketing collateral Iterate across typefaces, layout styles, and color systems without sacrificing text accuracy or fine detail Produce production-ready social graphics and campaign visuals with correctly rendered copy Creative & Media Production Render multi-word headlines, body copy blocks, and editorial layouts that adapt seamlessly to any style. Modern, classic, handwritten, or dynamic 3D lettering Generate magazine covers, editorial spreads, and text-heavy visual layouts with correctly spaced, readable fonts at production quality Test copywriting concepts and headline variations at 3x the speed of previous-generation models Product UI Prototyping Rapidly prototype product UIs with rich text components, accurate button labels, navigation copy, and on-screen text Build and iterate on app screens, dashboards, and design systems where text legibility and layout precision are critical 💡Tip: Use FLUX.2 [pro] in Microsoft Foundry when your workflow prioritizes photorealistic imagery or cinematic-quality final renders, rather than text-forward or graphic design outputs. Join Black Forest Labs on Model Mondays YouTube Livestream: March 2, 2026 Stephen Batifol, Developer Advocate at Black Forest Labs, joins Model Mondays to show how teams use BFL’s generative image models on Microsoft Foundry to move from experimentation to real‑world creative workflows at scale with fewer touch‑ups, stronger brand fidelity, and state‑of‑the‑art image quality. This hands‑on session features a live demo and practical patterns for building scalable image pipelines, so you can ship complete creative campaigns in days, not weeks. RSVP Here Pricing Foundry Models are fully hosted and managed on Azure. FLUX.2 [flex] is available through pay-as-you-go and on Global Standard deployment type with the following pricing: Model Type Pricing FLUX.2-flex Global Standard $0.05 per megapixel (MP) Important Notes: For pricing, resolution is always rounded up to the next megapixel, separately for each reference image and for the generated image. 1 megapixel is counted as 1024x1024 pixels For multiple reference images, each reference image is counted as 1 megapixel Images exceeding 4 megapixels are resized to 4 megapixels Reference the Foundry Models pricing page for pricing. Build Trustworthy AI Solutions Black Forest Labs models in Foundry Models are delivered under the Microsoft Product Terms, giving you enterprise-grade security and compliance out of the box. Each FLUX endpoint offers Content Safety controls and guardrails. Runtime protections include built-in content-safety filters, role-based access control, virtual-network isolation, and automatic Azure Monitor logging. Governance signals stream directly into Azure Policy, Purview, and Microsoft Sentinel, giving security and compliance teams real-time visibility. Together, Microsoft's capabilities let you create with more confidence, knowing that privacy, security, and safety are woven into every Black Forest Labs deployment from day one. Getting Started with FLUX.2 in Microsoft Foundry If you don’t have an Azure subscription, you can sign up for an Azure account here. Search for the model name in the model catalog in Microsoft Foundry under “Build.” FLUX.2-flex Open the model card in the model catalog. Click on deploy to obtain the inference API and key. View your deployment under Build > Models. You should land on the deployment page that shows you the API and key in less than a minute. You can try out your prompts in the playground. You can use the API and key with various clients. Learn More FLUX.2 [flex] is now available as gated Public Preview. Fill out this form Black Forest Lab's Flux models Limited Access Applications to get access to these models today. ▶️ RSVP for the next Model Monday LIVE on YouTube or On-Demand 👩💻 Explore FLUX.2 Documentation on Microsoft Learn 👋 Continue the conversation on Discord308Views0likes0CommentsNew Azure Open AI models bring fast, expressive, and real‑time AI experiences in Microsoft Foundry
Modern AI applications, whether voice‑first experiences or building large software systems, rarely fit into a single prompt. Real work unfolds over time: maintaining context, following instructions, invoking tools, and adapting as requirements evolve. When these foundations break down through latency spikes, instruction drift, or unreliable tool calls, both user conversations and developer workflows are impacted. OpenAI’s latest models address this shared challenge by prioritizing continuity and reliability across real‑time interaction and long‑running engineering tasks. Starting today, GPT-Realtime-1.5, GPT-Audio-1.5, and GPT-5.3-Codex are rolling out into Microsoft Foundry. Together, these models reflect the growing needs of the modern developer and push the needle from short, stateless interactions toward AI systems that can reason, act, and collaborate over time. GPT-5.3-Codex at a glance GPT‑5.3‑Codex brings together advanced coding capability with broader reasoning and professional problem solving in a single model built for real engineering work. It unifies the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT5.2 in one system. This shifts the experience from optimizing isolated outputs to supporting longer running development efforts; where repositories are large, changes span multiple steps, and requirements aren’t always fully specified at the start. What’s improved Model experiences 25% faster execution time, according to Open AI, than its predecessors so developers can accelerate development of new applications. Built for long-running tasks that involve research, tool use, and complex, multi‑step execution while maintaining context. Midtask steerability and frequent updates allow developers to redirect and collaborate with the model as it works without losing context. Stronger computer-use capabilities allow developers to execute across the full spectrum of technical work. Common use cases Developers and teams can apply GPT‑5.3‑Codex across a wide range of scenarios, including: Refactoring and modernizing large or legacy applications Performing multi‑step migrations or upgrades Running agentic developer workflows that span analysis, implementation, testing, and remediation Automating code reviews, test generation, and defect detection Supporting development in security‑sensitive or regulated environments Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.3-Codex $1.75 $0.175 $14.00 GPT-Realtime-1.5 and GPT-Audio-1.5 at a glance The models deliver measurable gains in reasoning and speech understanding for real‑time voice interactions on Microsoft Foundry. In OpenAI’s evaluations, it shows a +5% lift on Big Bench Audio (reasoning), a +10.23% improvement in alphanumeric transcription, and a +7% gain in instruction following, while maintaining low‑latency performance. Key improvements include: What's improved More natural‑sounding speech: Audio output is smoother and more conversational, with improved pacing and prosody. Higher audio quality: Clearer, more consistent audio output across supported voices. Improved instruction following: Better alignment with developer‑provided system and user instructions during live interactions. Function calling support: Enables structured, tool‑driven interactions within real‑time audio flows. Common use cases Developers are using GPT-Realtime-1.5 and GPT-Audio-1.5 for scenarios where low‑latency voice interaction is essential, including: Conversational voice agents for customer support or internal help desks Voice‑enabled assistants embedded in applications or devices Live voice interfaces for kiosks, demos, and interactive experiences Hands‑free workflows where audio input and output replace keyboard interaction Pricing Model Text Audio Image Input Cached Input Output Input Cached Input Output Input Cached Input Output GPT-Realtime-1.5 $4.00 $0.04 $16.0 $32.0 $0.40 $64.00 $4.00 $0.04 $16.0 GPT-Audio-1.5 $2.50 n/a $10.0 $32.00 n/a $64.00 $2.50 n/a $10.0 Getting started in Microsoft Foundry Start building in Microsoft Foundry, evaluate performance, and explore Azure Open AI models today. Foundry brings evaluation, deployment, and governance into a single workflow, helping teams progress from experiments to scalable applications while maintaining security and operational controls.5.7KViews1like0CommentsWhat’s trending on Hugging Face: PubMedBERT Base Embeddings, Paraphrase Multilingual MiniLM, BGE-M3
The embedding model landscape has evolved beyond one-size-fits-all solutions. Today’s developers navigate a set of deliberate trade‑offs: domain specialization to improve accuracy in vertical applications, multilingual capabilities to support global use cases, and retrieval strategies that optimize performance at scale. Once a model demonstrates strong semantic performance, predictable behavior, and broad community support, it often becomes a trusted reference baseline that developers build around and deploy with confidence. This week, we’re not spotlighting models that are new to Microsoft Foundry. Instead, we’re turning our attention to models that have managed to stay relevant in a rapidly expanding sea of options. This week's Model Monday's edition highlights three Hugging Face models including NeuML's PubMedBERT Base Embeddings for domain-specific medical text understanding, Sentence Transformers' Paraphrase Multilingual MiniLM for lightweight cross-lingual semantic similarity, and BAAI's BGE-M3 for multi-functional long-context retrieval across 100+ languages. Models of the week NeuML: PubMedBERT Base Embeddings Model Specs Parameters / size: 109M Context length: 512 tokens Primary task: Embeddings (medical domain) Why it's interesting Domain-specific performance gains: Fine-tuned on PubMed title-abstract pairs, achieving 95.62% average Pearson correlation across medical benchmarks—outperforming general-purpose models like gte-base (95.37%), bge-base-en-v1.5 (93.78%), and all-MiniLM-L6-v2 (93.46%) on medical literature tasks Production-validated for medical RAG: With 141K downloads and deployment in 30+ medical AI applications, this model demonstrates consistent real-world performance for clinical research, drug discovery, and biomedical semantic search pipelines Built on Microsoft's BiomedNLP foundation: Extends BioMed BERT family with sentence-transformers mean pooling, creating 768-dimensional embeddings optimized for medical literature clustering and retrieval Try it Clinical research sample prompt: Industry specific sample prompt: You're building a clinical decision support system for oncology. Deploy PubMedBERT Base Embeddings in Microsoft Foundry to index 50,000 recent cancer research abstracts from PubMed. A physician queries: "What are the cardiotoxicity risks of combining checkpoint inhibitors with anthracycline chemotherapy in elderly patients?" Embed the query, retrieve the top 10 most semantically similar abstracts using cosine similarity, and return citations with PubMed IDs for evidence-based treatment planning. Sentence Transformers: Paraphrase Multilingual MiniLM L12 v2 Model Specs Parameters / size: 117M Context length: 128 tokens Primary task: Embeddings (multilingual, sentence similarity) Why it's interesting Multilingual adoption: Supports 50+ languages including Arabic, Chinese, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Vietnamese—with 18.4 million downloads last month demonstrating production-scale validation across global deployments Compact architecture for edge deployment: At 117M parameters producing 384-dimensional embeddings, this model balances multilingual coverage with inference efficiency, making it ideal for resource-constrained environments or high-throughput applications Sentence-BERT foundation: Based on the influential Sentence-BERT paper (Reimers & Gurevych, 2019), using siamese BERT networks with mean pooling to create semantically meaningful sentence embeddings for clustering, paraphrase detection, and cross-lingual search Community-proven versatility: With 299 fine-tuned variants and 100+ Spaces implementations, this model serves as a peer reviewed starting point for multilingual semantic similarity tasks, from customer support ticket routing to cross-lingual document retrieval Try it E-commerce sample prompt: You're building a global customer support platform for an e-commerce company operating in 30 countries. Deploy Paraphrase Multilingual MiniLM in Microsoft Foundry to process incoming support tickets in English, Spanish, French, German, Portuguese, Japanese, and Korean. Embed each ticket as a 384-dimensional vector and cluster by semantic similarity to automatically route issues to specialized teams (payment, shipping, returns, technical). Flag duplicate tickets with cosine similarity > 0.85 to prevent redundant responses. BAAI: BGE-M3 Model Specs Parameters / size: ~560M Context length: 8192 tokens Primary task: Embeddings (multi-functional: dense, sparse, multi-vector) Why it's interesting Three retrieval modes in one model: Uniquely supports dense retrieval (1024-dim embeddings), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching)—enabling hybrid search pipelines without maintaining separate models or indexes Exceptional long-context capability: 8192-token context window handles full documents, legal contracts, research papers, and lengthy technical content—validated on MLDR (13-language document retrieval) and NarrativeQA (long-form question answering) benchmarks Multilingual dominance: Outperforms OpenAI embeddings on MIRACL multilingual retrieval across 13+ languages and demonstrates strong zero-shot cross-lingual transfer on MKQA. Try it Legal document search sample prompt: You're building a legal document search system for a multinational law firm. Deploy BGE-M3 in Microsoft Foundry to index 5,000 full-length commercial contracts (average 6,000 tokens each) in English, French, German, and Spanish. A lawyer queries: "Find all force majeure clauses that exclude liability for pandemics or global health emergencies." Use hybrid retrieval: (1) dense embeddings for semantic similarity to capture concept variations like "Act of God" or "unforeseen circumstances", (2) sparse retrieval for exact keyword matches on "force majeure", "pandemic", "health emergency". Combine scores with weighted sum (0.6 dense + 0.4 sparse) and return top 15 contract sections with clause numbers and jurisdiction metadata. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry164Views0likes0CommentsUnlocking Document Understanding with Mistral Document AI in Microsoft Foundry
Enterprises today face a familiar yet formidable challenge: mountains of documents -contracts, invoices, reports, forms - remain locked in unstructured formats. Traditional OCR (optical character recognition) captures text, but often struggles with context, layout complexity, or multilingual content. The result? Slow workflows, error-prone manual reviews, and missed insights. Enter mistral-document-ai-2512 in Microsoft Foundry. This new model brings together high-end OCR using mistral-ocr-2512 and intelligent document understanding using mistral-small-2506 to turn unstructured documents into actionable data. It doesn’t just “read” pages - it understands them: multi-column layouts, handwritten annotations, tables with merging cells, multilingual content-all processed with enterprise-grade speed and precision. In this blog, we’ll explore what Mistral Document AI 2512 is, why it matters, how it stacks up, and the business impact it promises, especially when paired with solution accelerators like ARGUS. Meet Mistral Document AI Mistral Document AI is an enterprise-grade document understanding model, offered via Microsoft Foundry. It’s built to convert both physical (scans, photos) and digital (PDFs, DOCX) documents into highly structured, machine-readable outputs. Key features include: Top-tier accuracy: According to benchmarks, Mistral’s OCR 2512 stacks display significantly higher accuracy than many alternatives, especially on scanned documents and complex layouts. For example, in comparisons it achieved ~95.9 % “overall” vs ~89-91 % for other platforms Global / multilingual reach: In language-by-language tests (Russian, French, German, Spanish, Chinese, etc), Mistral’s error-rate/fuzzy-match metrics reached 99 %+ in many cases Layout & context awareness: It’s built to not just extract linear text but understand multi-column layouts, tables, charts, images, handwritten input and more Structured output functionality: The model supports structured extraction (JSON), markup (Markdown with interleaved images), preserving document structure for downstream systems Enterprise-ready deployment: With availability via Microsoft Foundry and support for private/secure inference, the model is geared for regulated industries and high-volume workflows Putting it another way: where traditional OCR stops at “here’s the raw text on page 7”, Mistral DocumentAI 2512 can say “here’s the vendor invoice, here are line-items, here’s the total, here’s the signature block, and here’s the part that was handwritten”, ready to plug into downstream systems. Business Impact & Industry examples Mistral Document AI isn’t just another OCR tool; it’s a strategic enabler that turns document-heavy operations into intelligent, automated workflows. The business value comes down to four key advantages: Speed and efficiency: Automating document understanding eliminates manual reviews and retyping. Tasks that took days can be done in minutes, accelerating core business processes Accuracy and consistency: With 99 %+ recognition accuracy and deep layout understanding, Mistral delivers cleaner data and fewer downstream errors - essential in compliance-critical or analytics-driven operations Cost and productivity gains: Reducing manual extraction frees teams for higher-value work, cutting operational costs while increasing output per employee Scalability and adaptability: Cloud-native performance allows organizations to scale document processing instantly during peak loads, across multiple languages and formats, without sacrificing quality Overall, mistral-document-ai-2512 excels where consistency and quality are critical. Industry and Use Cases In regulated industries or big-data scenarios, even a small improvement in accuracy or speed can translate into substantial business gains. Its benchmarks indicate not just incremental progress, but a major step forward - giving enterprises a powerful new engine for their document workflows. Here’s where that impact becomes tangible: Financial services: Banks and insurers handle vast document volumes - loan applications, KYC forms, and claims reports - where data integrity and auditability are non-negotiable. Mistral automates extraction, classification, and clause identification across diverse formats, improving turnaround time and compliance accuracy while reducing manual handling costs Healthcare & life sciences: Clinical records, lab results, and insurance claims often combine handwritten, tabular, and multi-language content. Mistral’s layout awareness and multilingual support ensure clean, structured datasets for downstream analytics and regulatory submissions Manufacturing & logistics: From quality certificates to shipping manifests, Mistral streamlines the flow of operational documents. It can extract production parameters, vendor data, and timestamps at scale - building a unified, queryable data layer that supports supply chain traceability Legal & public sector: Legal teams and agencies depend on consistency and transparency. Mistral helps index, summarise, and validate contracts or permits with full structural fidelity - dramatically cutting review cycles while maintaining evidential quality Retail & consumer goods: Retailers process supplier invoices, product specifications, and marketing briefs from global partners. With Mistral’s multilingual precision and structure preservation, global document flows become searchable and analytics-ready Across these industries, the result is the same: cleaner data, faster throughput, and fewer human errors - the foundation for more reliable decisions and more agile operations. Pricing Argus – A ready-to-implement accelerator to start using Mistral Document AI To spin up a solution faster, one can leverage solution accelerators such as ARGUS (open-source repository available on GitHub). ARGUS serves as a full-pipeline implementation: from document ingestion, OCR/extraction (via Mistral Document AI), to downstream processing and structured output. It shows how to deploy end-to-end, integrate with storage, preprocess documents, handle large-scale batches, output JSON schemas, and integrate into existing business workflows. Mistral Document AI Integration ARGUS now offers flexible OCR provider selection with Mistral Document AI as one of the several options. This enhancement gives you the freedom to choose the best OCR engine for your specific document processing needs. Key Features: Dual Provider Support: Toggle between Azure Document Intelligence (default) and Mistral Document AI Runtime Switching: Change OCR providers on-the-fly through the Settings UI without redeployment Simple Configuration: Set up Mistral via environment variables (OCR_PROVIDER, MISTRAL_DOC_AI_ENDPOINT, MISTRAL_DOC_AI_KEY) or the web interface Seamless Integration: Both providers expose the same interface, ensuring consistent behavior across your document processing pipeline Why This Matters: Different OCR engines excel at processing different document content. Azure Document Intelligence offers enterprise-grade form and table recognition, while Mistral Document AI 2512, in addition, enables extraction to structured JSON with customizable schemas, document classification, and image processing—including text, charts, and signatures. It can convert charts into tables, extract fine print from figures, and even define custom image types for specialized workflows. Now you can select the optimal provider for each use case. In effect, instead of building from scratch, ARGUS gives you the legs to run: pipeline orchestration, ingestion, error-handling, schema-mapping, output integration-all wired to Mistral’s engine. This significantly accelerates time-to-value and reduces risk for enterprise adopters. Getting Started: Navigate to the ARGUS frontend interface (Streamlit app) and click on the Settings tab. In the OCR Provider Configuration section, select your preferred provider. If using Mistral, enter your endpoint URL, API key, and model name. Click Update OCR Provider to apply changes immediately—no restart required. All new document processing will use your selected OCR engine. If your organization is looking to unlock document intelligence, here’s a structured path: Explore Mistral Document AI via Microsoft Foundry: Browse the model card, review endpoint specs, try sample documents to test accuracy and extraction structure Deploy and Pilot with ARGUS: Use the GitHub repo to spin up an end-to-end pipeline on a small workload (e.g., a batch of invoices or contracts) and compare manual vs AI-driven throughput and error-rates Define business value metrics: Track processing time, error rate, manual hours saved, and downstream impact (faster decision cycles, fewer reworks). Scale and govern: Once pilot proves value, expand into multiple document types, languages, geographies - and ensure governance (data handling, compliance, model-monitoring) Embed continuous improvement: As usage grows, feed back learnings, tune schema definitions, refine extraction rules, and extend into QA, insights or analytics layers Conclusion In today’s data-rich but document-heavy environment, the ability to truly understand documents (and not just digitize them) is becoming a strategic imperative. Mistral Document AI represents a next-generation shift: accurate, layout-aware, multilingual, structured. When paired with accelerators like ARGUS, enterprises can move from manual bottlenecks to streamlined, insight-rich document workflows. If you’re thinking about unlocking the value buried in your documents-be it invoices, contracts, forms or reports, now is the time. With mistral-document-ai-2512, what used to be a cost-center is now a potential performance lever. Ready to get started? Explore the model, and let your documents begin talking back.2.7KViews2likes0CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry735Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Open‑source AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a production‑ready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Monday’s series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This week’s Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Meta’s unified foundation model for image and video segmentation, and MiniMax’s latest open-source agentic model optimized for complex workflows. Models of the week Z.AI’s GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why it’s interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑context agent workflows (local or low‑cost autonomous agents) Call out long‑horizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLM‑4.7‑Flash works best when you give it a clear goal and let it reason through a bounded task, here’s an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a mid‑scale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edge‑case failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize low‑frequency, high‑impact risks that standard testing misses. Recommend minimal, low‑cost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why it’s interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Best‑fit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation What’s notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat SAM 3 as a concept detector, not an interactive click tool. Use short, concrete noun‑phrase concept prompts instead of describing the scene or asking questions. Example prompt: “yellow school bus” or “shipping containers”. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: “person wearing a red jersey”. Hard‑to‑name or visually subtle objects Use exemplar‑based prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid over‑constraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAM 3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixel‑accurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frame‑level masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why it’s interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to one”. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Best‑practice prompt pattern End‑to‑end agentic coding (multi‑file edits, run‑fix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑horizon tool‑using agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1’s interleaved thinking and improved instruction‑constraint handling are designed for complex, multi‑step analytical tasks that require evidence tracking and coherent synthesis, not conversational back‑and‑forth. Long‑context reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMax‑M2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMax‑M2.1 is designed to act as a long‑horizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of information—here’s a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry919Views0likes0CommentsAnswer synthesis in Foundry IQ: Quality metrics across 10,000 queries
With answers, you can control your entire RAG pipeline directly in Foundry IQ by Azure AI Search, without integrations. Responding only when the data supports it, answers delivers grounded, steerable, citation-rich responses and traces each piece of information to its original source. Here’s how it works and how it performed across our experiments.887Views0likes0Comments