artificial intelligence
343 TopicsWhat’s trending on Hugging Face: PubMedBERT Base Embeddings, Paraphrase Multilingual MiniLM, BGE-M3
The embedding model landscape has evolved beyond one-size-fits-all solutions. Today’s developers navigate a set of deliberate trade‑offs: domain specialization to improve accuracy in vertical applications, multilingual capabilities to support global use cases, and retrieval strategies that optimize performance at scale. Once a model demonstrates strong semantic performance, predictable behavior, and broad community support, it often becomes a trusted reference baseline that developers build around and deploy with confidence. This week, we’re not spotlighting models that are new to Microsoft Foundry. Instead, we’re turning our attention to models that have managed to stay relevant in a rapidly expanding sea of options. This week's Model Monday's edition highlights three Hugging Face models including NeuML's PubMedBERT Base Embeddings for domain-specific medical text understanding, Sentence Transformers' Paraphrase Multilingual MiniLM for lightweight cross-lingual semantic similarity, and BAAI's BGE-M3 for multi-functional long-context retrieval across 100+ languages. Models of the week NeuML: PubMedBERT Base Embeddings Model Specs Parameters / size: 109M Context length: 512 tokens Primary task: Embeddings (medical domain) Why it's interesting Domain-specific performance gains: Fine-tuned on PubMed title-abstract pairs, achieving 95.62% average Pearson correlation across medical benchmarks—outperforming general-purpose models like gte-base (95.37%), bge-base-en-v1.5 (93.78%), and all-MiniLM-L6-v2 (93.46%) on medical literature tasks Production-validated for medical RAG: With 141K downloads and deployment in 30+ medical AI applications, this model demonstrates consistent real-world performance for clinical research, drug discovery, and biomedical semantic search pipelines Built on Microsoft's BiomedNLP foundation: Extends BioMed BERT family with sentence-transformers mean pooling, creating 768-dimensional embeddings optimized for medical literature clustering and retrieval Try it Clinical research sample prompt: Industry specific sample prompt: You're building a clinical decision support system for oncology. Deploy PubMedBERT Base Embeddings in Microsoft Foundry to index 50,000 recent cancer research abstracts from PubMed. A physician queries: "What are the cardiotoxicity risks of combining checkpoint inhibitors with anthracycline chemotherapy in elderly patients?" Embed the query, retrieve the top 10 most semantically similar abstracts using cosine similarity, and return citations with PubMed IDs for evidence-based treatment planning. Sentence Transformers: Paraphrase Multilingual MiniLM L12 v2 Model Specs Parameters / size: 117M Context length: 128 tokens Primary task: Embeddings (multilingual, sentence similarity) Why it's interesting Multilingual adoption: Supports 50+ languages including Arabic, Chinese, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Vietnamese—with 18.4 million downloads last month demonstrating production-scale validation across global deployments Compact architecture for edge deployment: At 117M parameters producing 384-dimensional embeddings, this model balances multilingual coverage with inference efficiency, making it ideal for resource-constrained environments or high-throughput applications Sentence-BERT foundation: Based on the influential Sentence-BERT paper (Reimers & Gurevych, 2019), using siamese BERT networks with mean pooling to create semantically meaningful sentence embeddings for clustering, paraphrase detection, and cross-lingual search Community-proven versatility: With 299 fine-tuned variants and 100+ Spaces implementations, this model serves as a peer reviewed starting point for multilingual semantic similarity tasks, from customer support ticket routing to cross-lingual document retrieval Try it E-commerce sample prompt: You're building a global customer support platform for an e-commerce company operating in 30 countries. Deploy Paraphrase Multilingual MiniLM in Microsoft Foundry to process incoming support tickets in English, Spanish, French, German, Portuguese, Japanese, and Korean. Embed each ticket as a 384-dimensional vector and cluster by semantic similarity to automatically route issues to specialized teams (payment, shipping, returns, technical). Flag duplicate tickets with cosine similarity > 0.85 to prevent redundant responses. BAAI: BGE-M3 Model Specs Parameters / size: ~560M Context length: 8192 tokens Primary task: Embeddings (multi-functional: dense, sparse, multi-vector) Why it's interesting Three retrieval modes in one model: Uniquely supports dense retrieval (1024-dim embeddings), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching)—enabling hybrid search pipelines without maintaining separate models or indexes Exceptional long-context capability: 8192-token context window handles full documents, legal contracts, research papers, and lengthy technical content—validated on MLDR (13-language document retrieval) and NarrativeQA (long-form question answering) benchmarks Multilingual dominance: Outperforms OpenAI embeddings on MIRACL multilingual retrieval across 13+ languages and demonstrates strong zero-shot cross-lingual transfer on MKQA. Try it Legal document search sample prompt: You're building a legal document search system for a multinational law firm. Deploy BGE-M3 in Microsoft Foundry to index 5,000 full-length commercial contracts (average 6,000 tokens each) in English, French, German, and Spanish. A lawyer queries: "Find all force majeure clauses that exclude liability for pandemics or global health emergencies." Use hybrid retrieval: (1) dense embeddings for semantic similarity to capture concept variations like "Act of God" or "unforeseen circumstances", (2) sparse retrieval for exact keyword matches on "force majeure", "pandemic", "health emergency". Combine scores with weighted sum (0.6 dense + 0.4 sparse) and return top 15 contract sections with clause numbers and jurisdiction metadata. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry32Views0likes0CommentsUnlocking Document Understanding with Mistral Document AI in Microsoft Foundry
Enterprises today face a familiar yet formidable challenge: mountains of documents -contracts, invoices, reports, forms - remain locked in unstructured formats. Traditional OCR (optical character recognition) captures text, but often struggles with context, layout complexity, or multilingual content. The result? Slow workflows, error-prone manual reviews, and missed insights. Enter mistral-document-ai-2512 in Microsoft Foundry. This new model brings together high-end OCR using mistral-ocr-2512 and intelligent document understanding using mistral-small-2506 to turn unstructured documents into actionable data. It doesn’t just “read” pages - it understands them: multi-column layouts, handwritten annotations, tables with merging cells, multilingual content-all processed with enterprise-grade speed and precision. In this blog, we’ll explore what Mistral Document AI 2512 is, why it matters, how it stacks up, and the business impact it promises, especially when paired with solution accelerators like ARGUS. Meet Mistral Document AI Mistral Document AI is an enterprise-grade document understanding model, offered via Microsoft Foundry. It’s built to convert both physical (scans, photos) and digital (PDFs, DOCX) documents into highly structured, machine-readable outputs. Key features include: Top-tier accuracy: According to benchmarks, Mistral’s OCR 2512 stacks display significantly higher accuracy than many alternatives, especially on scanned documents and complex layouts. For example, in comparisons it achieved ~95.9 % “overall” vs ~89-91 % for other platforms Global / multilingual reach: In language-by-language tests (Russian, French, German, Spanish, Chinese, etc), Mistral’s error-rate/fuzzy-match metrics reached 99 %+ in many cases Layout & context awareness: It’s built to not just extract linear text but understand multi-column layouts, tables, charts, images, handwritten input and more Structured output functionality: The model supports structured extraction (JSON), markup (Markdown with interleaved images), preserving document structure for downstream systems Enterprise-ready deployment: With availability via Microsoft Foundry and support for private/secure inference, the model is geared for regulated industries and high-volume workflows Putting it another way: where traditional OCR stops at “here’s the raw text on page 7”, Mistral DocumentAI 2512 can say “here’s the vendor invoice, here are line-items, here’s the total, here’s the signature block, and here’s the part that was handwritten”, ready to plug into downstream systems. Business Impact & Industry examples Mistral Document AI isn’t just another OCR tool; it’s a strategic enabler that turns document-heavy operations into intelligent, automated workflows. The business value comes down to four key advantages: Speed and efficiency: Automating document understanding eliminates manual reviews and retyping. Tasks that took days can be done in minutes, accelerating core business processes Accuracy and consistency: With 99 %+ recognition accuracy and deep layout understanding, Mistral delivers cleaner data and fewer downstream errors - essential in compliance-critical or analytics-driven operations Cost and productivity gains: Reducing manual extraction frees teams for higher-value work, cutting operational costs while increasing output per employee Scalability and adaptability: Cloud-native performance allows organizations to scale document processing instantly during peak loads, across multiple languages and formats, without sacrificing quality Overall, mistral-document-ai-2512 excels where consistency and quality are critical. Industry and Use Cases In regulated industries or big-data scenarios, even a small improvement in accuracy or speed can translate into substantial business gains. Its benchmarks indicate not just incremental progress, but a major step forward - giving enterprises a powerful new engine for their document workflows. Here’s where that impact becomes tangible: Financial services: Banks and insurers handle vast document volumes - loan applications, KYC forms, and claims reports - where data integrity and auditability are non-negotiable. Mistral automates extraction, classification, and clause identification across diverse formats, improving turnaround time and compliance accuracy while reducing manual handling costs Healthcare & life sciences: Clinical records, lab results, and insurance claims often combine handwritten, tabular, and multi-language content. Mistral’s layout awareness and multilingual support ensure clean, structured datasets for downstream analytics and regulatory submissions Manufacturing & logistics: From quality certificates to shipping manifests, Mistral streamlines the flow of operational documents. It can extract production parameters, vendor data, and timestamps at scale - building a unified, queryable data layer that supports supply chain traceability Legal & public sector: Legal teams and agencies depend on consistency and transparency. Mistral helps index, summarise, and validate contracts or permits with full structural fidelity - dramatically cutting review cycles while maintaining evidential quality Retail & consumer goods: Retailers process supplier invoices, product specifications, and marketing briefs from global partners. With Mistral’s multilingual precision and structure preservation, global document flows become searchable and analytics-ready Across these industries, the result is the same: cleaner data, faster throughput, and fewer human errors - the foundation for more reliable decisions and more agile operations. Pricing Argus – A ready-to-implement accelerator to start using Mistral Document AI To spin up a solution faster, one can leverage solution accelerators such as ARGUS (open-source repository available on GitHub). ARGUS serves as a full-pipeline implementation: from document ingestion, OCR/extraction (via Mistral Document AI), to downstream processing and structured output. It shows how to deploy end-to-end, integrate with storage, preprocess documents, handle large-scale batches, output JSON schemas, and integrate into existing business workflows. Mistral Document AI Integration ARGUS now offers flexible OCR provider selection with Mistral Document AI as one of the several options. This enhancement gives you the freedom to choose the best OCR engine for your specific document processing needs. Key Features: Dual Provider Support: Toggle between Azure Document Intelligence (default) and Mistral Document AI Runtime Switching: Change OCR providers on-the-fly through the Settings UI without redeployment Simple Configuration: Set up Mistral via environment variables (OCR_PROVIDER, MISTRAL_DOC_AI_ENDPOINT, MISTRAL_DOC_AI_KEY) or the web interface Seamless Integration: Both providers expose the same interface, ensuring consistent behavior across your document processing pipeline Why This Matters: Different OCR engines excel at processing different document content. Azure Document Intelligence offers enterprise-grade form and table recognition, while Mistral Document AI 2512, in addition, enables extraction to structured JSON with customizable schemas, document classification, and image processing—including text, charts, and signatures. It can convert charts into tables, extract fine print from figures, and even define custom image types for specialized workflows. Now you can select the optimal provider for each use case. In effect, instead of building from scratch, ARGUS gives you the legs to run: pipeline orchestration, ingestion, error-handling, schema-mapping, output integration-all wired to Mistral’s engine. This significantly accelerates time-to-value and reduces risk for enterprise adopters. Getting Started: Navigate to the ARGUS frontend interface (Streamlit app) and click on the Settings tab. In the OCR Provider Configuration section, select your preferred provider. If using Mistral, enter your endpoint URL, API key, and model name. Click Update OCR Provider to apply changes immediately—no restart required. All new document processing will use your selected OCR engine. If your organization is looking to unlock document intelligence, here’s a structured path: Explore Mistral Document AI via Microsoft Foundry: Browse the model card, review endpoint specs, try sample documents to test accuracy and extraction structure Deploy and Pilot with ARGUS: Use the GitHub repo to spin up an end-to-end pipeline on a small workload (e.g., a batch of invoices or contracts) and compare manual vs AI-driven throughput and error-rates Define business value metrics: Track processing time, error rate, manual hours saved, and downstream impact (faster decision cycles, fewer reworks). Scale and govern: Once pilot proves value, expand into multiple document types, languages, geographies - and ensure governance (data handling, compliance, model-monitoring) Embed continuous improvement: As usage grows, feed back learnings, tune schema definitions, refine extraction rules, and extend into QA, insights or analytics layers Conclusion In today’s data-rich but document-heavy environment, the ability to truly understand documents (and not just digitize them) is becoming a strategic imperative. Mistral Document AI represents a next-generation shift: accurate, layout-aware, multilingual, structured. When paired with accelerators like ARGUS, enterprises can move from manual bottlenecks to streamlined, insight-rich document workflows. If you’re thinking about unlocking the value buried in your documents-be it invoices, contracts, forms or reports, now is the time. With mistral-document-ai-2512, what used to be a cost-center is now a potential performance lever. Ready to get started? Explore the model, and let your documents begin talking back.1.6KViews1like0CommentsUnlocking Advanced Data Analytics & AI with Azure NetApp Files object REST API
Azure NetApp Files object REST API enables object access to enterprise file data stored on Azure NetApp Files, without copying, moving, or restructuring that data. This capability allows analytics and AI platforms that expect object storage to work directly against existing NFS based datasets, while preserving Azure NetApp Files’ performance, security, and governance characteristics.408Views0likes0CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry641Views0likes0CommentsBuilding a Secure and Compliant Azure AI Landing Zone: Policy Framework & Best Practices
As organizations accelerate their AI adoption on Microsoft Azure, governance, compliance, and security become critical pillars for success. Deploying AI workloads without a structured compliance framework can expose enterprises to data privacy issues, misconfigurations, and regulatory risks. To address this challenge, the Azure AI Landing Zone provides a scalable and secure foundation — bringing together Azure Policy, Blueprints, and Infrastructure-as-Code (IaC) to ensure every resource aligns with organizational and regulatory standards. The Azure Policy & Compliance Framework acts as the governance backbone of this landing zone. It enforces consistency across environments by applying policy definitions, initiatives, and assignments that monitor and remediate non-compliant resources automatically. This blog will guide you through: 🧭 The architecture and layers of an AI Landing Zone 🧩 How Azure Policy as Code enables automated governance ⚙️ Steps to implement and deploy policies using IaC pipelines 📈 Visualizing compliance flows for AI-specific resources What is Azure AI Landing Zone (AI ALZ)? AI ALZ is a foundational architecture that integrates core Azure services (ML, OpenAI, Cognitive Services) with best practices in identity, networking, governance, and operations. To ensure consistency, security, and responsibility, a robust policy framework is essential. Policy & Compliance in AI ALZ Azure Policy helps enforce standards across subscriptions and resource groups. You define policies (single rules), group them into initiatives (policy sets), and assign them with certain scopes & exemptions. Compliance reporting helps surface noncompliant resources for mitigation. In AI workloads, some unique considerations: Sensitive data (PII, models) Model accountability, logging, audit trails Cost & performance from heavy compute usage Preview features and frequent updates Scope This framework covers: Azure Machine Learning (AML) Azure API Management Azure AI Foundry Azure App Service Azure Cognitive Services Azure OpenAI Azure Storage Accounts Azure Databases (SQL, Cosmos DB, MySQL, PostgreSQL) Azure Key Vault Azure Kubernetes Service Core Policy Categories 1. Networking & Access Control Restrict resource deployment to approved regions (e.g., Europe only). Enforce private link and private endpoint usage for all critical resources. Disable public network access for workspaces, storage, search, and key vaults. 2. Identity & Authentication Require user-assigned managed identities for resource access. Disable local authentication; enforce Microsoft Entra ID (Azure AD) authentication. 3. Data Protection Enforce encryption at rest with customer-managed keys (CMK). Restrict public access to storage accounts and databases. 4. Monitoring & Logging Deploy diagnostic settings to Log Analytics for all key resources. Ensure activity/resource logs are enabled and retained for at least one year. 5. Resource-Specific Guardrails Apply built-in and custom policy initiatives for OpenAI, Kubernetes, App Services, Databases, etc. A detailed list of all policies is bundled and attached at the end of this blog. Be sure to check it out for a ready-to-use Excel file—perfect for customer workshops—which includes policy type (Standalone/Initiative), origin (Built-in/Custom), and more. Implementation: Policy-as-Code using EPAC To turn policies from Excel/JSON into operational governance, Enterprise Policy as Code (EPAC) is a powerful tool. EPAC transforms policy artifacts into a desired state repository and handles deployment, lifecycle, versioning, and CI/CD automation. What is EPAC & Why Use It? EPAC is a set of PowerShell scripts / modules to deploy policy definitions, initiatives, assignments, role assignments, exemptions. Enterprise Policy As Code (EPAC) It supports CI/CD integration (GitHub Actions, Azure DevOps) so policy changes can be treated like code. It handles ordering, dependency resolution, and enforcement of a “desired state” — any policy resources not in your repo may be pruned (depending on configuration). It integrates with Azure Landing Zones (including governance baseline) out of the box. References & Further Reading EPAC GitHub Repository Advanced Azure Policy management - Microsoft Learn [Advanced A...Framework] How to deploy Azure policies the DevOps way [How to dep...- Rabobank]1.8KViews1like2CommentsArchitecting an Azure AI Hub-and-Spoke Landing Zone for Multi-Tenant Enterprises
A large enterprise customer adopting AI at scale typically needs three non‑negotiables in its AI foundation: End‑to‑end tenant isolation across network, identity, compute, and data Secure, governed traffic flow from users to AI services Transparent chargeback/showback for shared AI and platform services At the same time, the platform must enable rapid onboarding of new tenants or applications and scale cleanly from proof‑of‑concept to production. This article proposes an Azure Landing Zone–aligned architecture using a Hub‑and‑Spoke model, where: The AI Hub centralizes shared services and governance AI Spokes host tenant‑dedicated AI resources Application logic and AI agents run on AKS The result is a secure, scalable, and operationally efficient enterprise AI foundation. 1. Architecture goals & design principles Goals Host application logic and AI agents on Azure Kubernetes Service (AKS) as custom deployments instead of using agents under Azure AI Foundry Enforce strong tenant isolation across all layers Support cross chargeback and cost attribution Adopt a Hub‑and‑Spoke model with clear separation of shared vs. tenant‑specific services Design principles (Azure Landing Zone aligned) Azure Landing Zone (ALZ) guidance emphasizes: Separation of platform and workload subscriptions Management groups and policy inheritance Centralized connectivity using hub‑and‑spoke networking Policy‑driven governance and automation For infrastructure as code, ALZ‑aligned deployments typically use Bicep or Terraform, increasingly leveraging Azure Verified Modules (AVM) for consistency and long‑term maintainability. 2. Subscription & management group model A practical enterprise layout looks like this: Tenant Root Management Group o Platform Management Group Connectivity subscription (Hub VNet, Firewall, DNS, ExpressRoute/VPN) Management subscription (Log Analytics, Monitor) Security subscription (Defender for Cloud, Sentinel if required) o AI Hub Management Group AI Hub subscription (shared AI and governance services) o AI Spokes Management Group One subscription per tenant, business unit, or regulated boundary This structure supports enterprise‑scale governance while allowing teams to operate independently within well‑defined guardrails. 3. Logical architecture — AI Hub vs. AI Spoke AI Hub (central/shared services) The AI Hub acts as the governed control plane for AI consumption: Ingress & edge security: Azure Application Gateway with WAF (or Front Door for global scenarios) Central egress control: Azure Firewall with forced tunneling API governance: Azure API Management (private/internal mode) Shared AI services: Azure OpenAI (shared deployments where appropriate), safety controls Monitoring & observability: Azure Monitor, Log Analytics, centralized dashboards Governance: Azure Policy, RBAC, naming and tagging standards All tenant traffic enters through the hub, ensuring consistent enforcement of security, identity, and usage policies. AI Spoke (tenant‑dedicated services) Each AI Spoke provides a tenant‑isolated data and execution plane: Tenant‑dedicated storage accounts and databases Vector stores and retrieval systems (Azure AI Search with isolated indexes or services) AKS runtime for tenant‑specific AI agents and backend services Tenant‑scoped keys, secrets, and identities 4. Logical architecture diagram (Hub vs. Spoke) 5. Network architecture — Hub and Spoke 6. Tenant onboarding & isolation strategy Tenant onboarding flow Tenant onboarding is automated using a landing zone vending model: Request new tenant or application Provision a spoke subscription and baseline policies Deploy spoke VNet and peer to hub Configure private DNS and firewall routes Deploy AKS tenancy and data services Register identities and API subscriptions Enable monitoring and cost attribution This approach enables consistent, repeatable onboarding with minimal manual effort. Isolation by design Network: Dedicated VNets, private endpoints, no public AI endpoints Identity: Microsoft Entra ID with tenant‑aware claims and conditional access Compute: AKS isolation using namespaces, node pools, or dedicated clusters Data: Per‑tenant storage, databases, and vector indexes 7. Identity & access management (Microsoft Entra ID) Key IAM practices include: Central Microsoft Entra ID tenant for authentication and authorization Application and workload identities using managed identities Tenant context enforced at API Management and propagated downstream Conditional Access and least‑privilege RBAC This ensures zero‑trust access while supporting both internal and partner scenarios. 8. Secure traffic flow (end‑to‑end) User accesses application via Application Gateway + WAF Traffic inspected and routed through Azure Firewall API Management validates identity, quotas, and tenant context AKS workloads invoke AI services over Private Link Responses return through the same governed path This pattern provides full auditability, threat protection, and policy enforcement. 9. AKS multitenancy options Model When to use Characteristics Namespace per tenant Default Cost‑efficient, logical isolation Dedicated node pools Medium isolation Reduced noisy‑neighbor risk Dedicated AKS cluster High compliance Maximum isolation, higher cost Enterprises typically adopt a tiered approach, choosing the isolation level per tenant based on regulatory and risk requirements. 10. Cost management & chargeback model Tagging strategy (mandatory) tenantId costCenter application environment owner Enforced via Azure Policy across all subscriptions. Chargeback approach Dedicated spoke resources: Direct attribution via subscription and tags Shared hub resources: Allocated using usage telemetry o API calls and token usage from API Management o CPU/memory usage from AKS namespaces Cost data is exported to Azure Cost Management and visualized using Power BI to support showback and chargeback. 11. Security controls checklist Private endpoints for AI services, storage, and search No public network access for sensitive services Azure Firewall for centralized egress and inspection WAF for OWASP protection Azure Policy for governance and compliance 12. Deployment & automation Foundation: Azure Landing Zone accelerators (Bicep or Terraform) Workloads: Modular IaC for hub and spokes AKS apps: GitOps (Flux or Argo CD) Observability: Policy‑driven diagnostics and centralized logging 13. Final thoughts This Azure AI Landing Zone design provides a repeatable, secure, and enterprise‑ready foundation for any large customer adopting AI at scale. By combining: Hub‑and‑Spoke networking AKS‑based AI agents Strong tenant isolation FinOps‑ready chargeback Azure Landing Zone best practices organizations can confidently move AI workloads from experimentation to production—without sacrificing security, governance, or cost transparency. Disclaimer: While the above article discusses hosting custom agents on AKS alongside customer-developed application logic, the following sections focus on a baseline deployment model with no customizations. This approach uses Azure AI Foundry, where models and agents are fully managed by Azure, with centrally governed LLMs(AI Hub) hosted in Azure AI Foundry and agents deployed in a spoke environment. 🚀 Get Started: Building a Secure & Scalable Azure AI Platform To help you accelerate your Azure AI journey, Microsoft and the community provide several reference architectures, solution accelerators, and best-practice guides. Together, these form a strong foundation for designing secure, governed, and cost-efficient GenAI and AI workloads at scale. Below is a recommended starting path. 1️⃣ AI Landing Zone (Foundation) Purpose: Establish a secure, enterprise-ready foundation for AI workloads. The AI Landing Zone extends the standard Azure Landing Zone with AI-specific considerations such as: Network isolation and hub-spoke design Identity and access control for AI services Secure connectivity to data sources Alignment with enterprise governance and compliance 🔗 AI Landing Zone (GitHub): https://github.com/Azure/AI-Landing-Zones?tab=readme-ov-file 👉 Start here if you want a standardized baseline before onboarding any AI workloads. 2️⃣ AI Hub Gateway – Solution Accelerator Purpose: Centralize and control access to AI services across multiple teams or customers. The AI Hub Gateway Solution Accelerator helps you: Expose AI capabilities (models, agents, APIs) via a centralized gateway Apply consistent security, routing, and traffic controls Support both Chat UI and API-based consumption Enable multi-team or multi-tenant AI usage patterns 🔗 AI Hub Gateway Solution Accelerator: https://github.com/mohamedsaif/ai-hub-gateway-landing-zone?tab=readme-ov-file 👉 Ideal when you want a shared AI platform with controlled access and visibility. 3️⃣ Citadel Governance Hub (Advanced Governance) Purpose: Enforce strong governance, compliance, and guardrails for AI usage. The Citadel Governance Hub builds on top of the AI Hub Gateway and focuses on: Policy enforcement for AI usage Centralized governance controls Secure onboarding of teams and workloads Alignment with enterprise risk and compliance requirements 🔗 Citadel Governance Hub (README): https://github.com/Azure-Samples/ai-hub-gateway-solution-accelerator/blob/citadel-v1/README.md 👉 Recommended for regulated environments or large enterprises with strict governance needs. 4️⃣ AKS Cost Analysis (Operational Excellence) Purpose: Understand and optimize the cost of running AI workloads on AKS. AI platforms often rely on AKS for agents, inference services, and gateways. This guide explains: How AKS costs are calculated How to analyze node, pod, and workload costs Techniques to optimize cluster spend 🔗 AKS Cost Analysis: https://learn.microsoft.com/en-us/azure/aks/cost-analysis 👉 Use this early to avoid unexpected cost overruns as AI usage scales. 5️⃣ AKS Multi-Tenancy & Cluster Isolation Purpose: Safely run workloads for multiple teams or customers on AKS. This guidance covers: Namespace vs cluster isolation strategies Security and blast-radius considerations When to use shared clusters vs dedicated clusters Best practices for multi-tenant AKS platforms 🔗 AKS Multi-Tenancy & Cluster Isolation: https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-cluster-isolation 👉 Critical reading if your AI platform supports multiple teams, business units, or customers. 🧭 Suggested Learning Path If you’re new, follow this order: AI Landing Zone → build the foundation AI Hub Gateway → centralize AI access Citadel Governance Hub → enforce guardrails AKS Cost Analysis → control spend AKS Multi-Tenancy → scale securely1.1KViews1like0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Open‑source AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a production‑ready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Monday’s series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This week’s Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Meta’s unified foundation model for image and video segmentation, and MiniMax’s latest open-source agentic model optimized for complex workflows. Models of the week Z.AI’s GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why it’s interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑context agent workflows (local or low‑cost autonomous agents) Call out long‑horizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLM‑4.7‑Flash works best when you give it a clear goal and let it reason through a bounded task, here’s an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a mid‑scale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edge‑case failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize low‑frequency, high‑impact risks that standard testing misses. Recommend minimal, low‑cost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why it’s interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Best‑fit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation What’s notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat SAM 3 as a concept detector, not an interactive click tool. Use short, concrete noun‑phrase concept prompts instead of describing the scene or asking questions. Example prompt: “yellow school bus” or “shipping containers”. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: “person wearing a red jersey”. Hard‑to‑name or visually subtle objects Use exemplar‑based prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid over‑constraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAM 3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixel‑accurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frame‑level masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why it’s interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to one”. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Best‑practice prompt pattern End‑to‑end agentic coding (multi‑file edits, run‑fix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑horizon tool‑using agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1’s interleaved thinking and improved instruction‑constraint handling are designed for complex, multi‑step analytical tasks that require evidence tracking and coherent synthesis, not conversational back‑and‑forth. Long‑context reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMax‑M2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMax‑M2.1 is designed to act as a long‑horizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of information—here’s a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry823Views0likes0CommentsMicrosoft Industrial AI Partner Guide: Choosing the Right Data Expertise for Every Stage
As organizations scale Industrial AI, the challenge shifts from technology selection to deciding who should lead which part of the journey -- and when. Which partners should establish secure connectivity? Who enables production grade, AI ready industrial data? When do systems integrators step in to scale globally? This Partner Guide helps customers navigate these decisions with clarity and confidence: Identify which partners align to their current digital transformation and Industrial AI scenarios leveraging Azure IoT and Azure IoT Operations Confidently combine partners over time as they evolve from connectivity to intelligence to autonomous operations This guide focuses on the Industrial AI data plane – the partners and capabilities that extract, contextualize, and operationalize industrial data so it can reliably power AI at scale. It does not attempt to catalog or prescribe end‑to‑end Industrial AI applications or cloud‑hosted AI solutions. Instead, it helps customers understand how industrial partners create the trusted, contextualized data foundation upon which AI solutions can be built. Common Customer Journey Steps 1. Modernize Connectivity & Edge Foundations The industrial transformation journey starts with securely accessing operational data without touching deterministic control loops. Customers connect automation systems to a scalable, standards-based data foundation that modernizes operations while preserving safety, uptime and control. Outcomes customers realize Standardized OT data access across plants and sites Faster onboarding of legacy and new assets Clear OT–IT boundaries that protect safety and uptime Partner strengths at this stage Industrial hardware and edge infrastructure providers Protocol translation and OT connectivity Automation and edge platforms aligned with Azure IoT Operations 2. Accelerate Insights with Industrial AI With a consistent edge-to-cloud data plane in place, customers move beyond dashboards to repeatable, production-grade Industrial AI use cases. Customers rely on expert partners to turn standardized operational data into AI‑ready signals that can be consumed by analytics and AI solutions at scale across assets, lines, and sites. Outcomes customers realize Improved Operational efficiency and performance Adaptive facilities and production quality intelligence Energy, safety, and defect detection at scale Partner strengths at this stage Industrial data services that contextualize and standardize OT signals for AI consumption Domain-specific acceleration for common Industrial AI scenarios Data pipelines integrated with Azure IoT Operations and Microsoft Fabric 3. Prepare for Autonomous Operations As organizations advance toward closed‑loop optimization, the focus shifts to safe, scalable autonomy. Customers depend on partners to align data, infrastructure, and operational interfaces, while ensuring ongoing monitoring, governance, and lifecycle management across the full operational estate. Outcomes customers realize Proven reference architectures deployed across plants AI‑ready data foundations that adapt as operations scale Coordinated interaction between OT systems, AI models, and cloud intelligence Partner strengths at this stage Industrial automation leadership and control system expertise Edge infrastructure optimized and ready for Industrial AI scale Systems integrators enabling end‑to‑end implementation and repeatability Data Intelligence Plane of Industrial AI - Partner Matrix This matrix highlights which partners have the deepest expertise in accessing, contextualizing, and operationalizing industrial data so it can reliably power AI at scale. The matrix is not a catalog of end‑to‑end Industrial AI applications; it shows how specialized partners contribute data, infrastructure, and integration capabilities on a shared Azure foundation as organizations progress from connectivity to insight to autonomous operations. How to use this matrix: Start with your scenario → identify primary partner types → layer complementary partners as you scale. Partner Type Adaptive Cloud Primary Solution Example Scenarios Geography Advantech Industrial Hardware, Industrial Connectivity LoRaWAN gateway integration + Azure IoT Operations Industrial edge platforms with built in connectivity, industrial compute, LoRaWAN, sensor networks Global Accenture GSI Industrial AI, Digital Transformation, Modernization OEE, predictive maintenance, real-time defect detection, optimize supply chains, intelligent automation and robotics, energy efficiency Global Avanade GSI Factory Agents and Analytics based on Manufacturing Data Solutions Yield / Quality optimization, OEE, Agentic Root Cause Analysis and process optimization; Unified ISA-95 Manufacturing Data estate on MS Fabric Global Capgemini GSI The new AI imperative in manufacturing OEE, maintenance, defect detection, energy, robotics Global DXC GSI Intelligent Boost AI and IoT Analytics Platform 5G Industrial Connectivity, Defect detection, OEE, safety, energy monitoring Global Innominds SI Intelligent Connected Edge Platform Predictive maintenance, AI on edge, asset tracking North America, EMEA Litmus Automation Industrial Connectivity, Industrial Data Ops Litmus Edge + Azure IoT Operations Edge Data, Smart manufacturing, IIoT deployments at scale Global, North America Mesh Systems GSI & ISV Azure IoT & Azure IoT Operations implementation services and solutions (including Azure IoT Operations-aligned connector patterns) Device connectivity and management, data platforms, visualization, AI agents, and security North America, EMEA Nortal GSI Data-driven Industry Solutions IT/OT Connectivity, Unified Namespace, Digital Twins, Optimization, Edge, Industrial Data, Real‑Time Analytics & AI EMEA, North America & LATAM NVIDIA Technology Partner Accelerated AI Infrastructure; Open libraries, models, frameworks, and blueprints for AI development and deployment. Cross industry digitalization and AI development and deployment: Generative AI, Agentic AI, Physical AI, Robotics Global Oracle ISV Oracle Fusion Cloud SCM + Azure IoT Operations Real-time manufacturing Intelligence, AI powered insights, and automated production workflows Global Rockwell Automation Industrial Automation FactoryTalk Optix + Azure IoT Operations Factory modernization, visualization, edge orchestration, DataOps with connectivity context at scale, AI ops and services, physical equipment, MES Global Schneider Electric Industrial Automation Industrial Edge Physical equipment, Device modernization, energy, grid Global Siemens Industrial Automation & Software Industrial Edge + Azure IoT Operations reference architecture Industrial edge infrastructure at scale, OT/IT convergence, DataOps, Industrial AI suite, virtualized automation. Global Sight Machine ISV Integrated Industrial AI Stack Industrial AI, bottling, process optimization Global Softing Industrial Industrial Connectivity edgeConnector + Azure IoT Operations OT connectivity, multi-vendor PLC- and machine data integration, OPC UA information model deployment EMEA, Global TCS GSI Sensor to cloud intelligence Operations optimization, healthcare digital twin experiences, supply chain monitoring Global This Ecosystem Model enables Industrial AI solutions to scale through clear roles, respected boundaries and composable systems: Control systems continue to be driven by automation leaders Safety‑critical, deterministic control stays with industrial automation partners who manage real‑time operations and plant safety. Customers modernize analytics and AI while preserving uptime, reliability, and operational integrity. Data, AI, and analytics scale independently A consistent edge to cloud data plane supports cloud scale analytics and AI, accelerating insight delivery without entangling control systems or slowing operational change. This separation allows customers and software providers to build AI solutions on top of a stable, industrial‑grade data foundation without redefining control system responsibilities. Specialized partners align solutions across the estate Partners contribute focused expertise across connectivity, analytics, security, and operations, assembling solutions that reduce integration risk, shorten deployment cycles, and speed time to value across the operational estate. From vision to production Industrial AI at scale depends on turning operational data into trusted, contextualized intelligence safely, repeatably, and across the enterprise. This guide shows how industrial partners, aligned on a shared Azure foundation, create the data plane that enables AI solutions to succeed in production. When data is ready, intelligence scales. Call to action: Use this guide to identify the partners and capabilities that best align to your current Industrial AI needs and take the next step toward production‑ready outcomes on Azure.883Views4likes0CommentsAnswer synthesis in Foundry IQ: Quality metrics across 10,000 queries
With answers, you can control your entire RAG pipeline directly in Foundry IQ by Azure AI Search, without integrations. Responding only when the data supports it, answers delivers grounded, steerable, citation-rich responses and traces each piece of information to its original source. Here’s how it works and how it performed across our experiments.859Views0likes0Comments