generative ai
128 TopicsBuild Sensitivity Label‑Aware, Secure RAG with Azure AI Search and Purview
Learn why many Azure AI Search developers unknowingly miss sensitivity‑labeled data—and how integrating Microsoft Purview sensitivity labels enables more secure, complete retrieval for RAG, copilot-style apps, and enterprise search. This post explains what sensitivity labels are, why they matter for Azure AI Search, what happens when the integration is not enabled, and how label-aware indexing and query‑time enforcement ensure label-protected documents are indexed, governed, and retrieved correctly in Azure AI Search when retrieved from SharePoint, OneLake, ADLS Gen2, and Azure Blob Storage. Includes supported sources, end‑to‑end flow, and links to documentation, demo apps, and video resources.1.4KViews4likes1CommentFrom Manual Document Processing to AI-Orchestrated Intelligence
Building an IDP Pipeline with Azure Durable Functions, DSPy, and Real-Time AI Reasoning The Problem Think about what happens when a loan application, an insurance claim, or a trade finance document arrives at an organisation. Someone opens it, reads it, manually types fields into a system, compares it against business rules, and escalates for approval. That process touches multiple people, takes hours or days, and the accuracy depends entirely on how carefully it's done. Organizations have tried to automate parts of this before — OCR tools, templated extraction, rule-based routing. But these approaches are brittle. They break when the document format changes, and they can't reason about what they're reading. The typical "solution" falls into one of two camps: Manual processing. Humans read, classify, and key in data. Accurate but slow, expensive, and impossible to scale. Single-model extraction. Throw an OCR/AI model at the document, trust the output, push to downstream systems. Fast but fragile — no validation, no human checkpoint, no confidence scoring. What's missing is the middle ground: an orchestrated, multi-model pipeline with built-in quality gates, real-time visibility, and the flexibility to handle any document type without rewriting code. That's what IDP Workflow is — a six-step AI-orchestrated pipeline that processes documents end to end, from a raw PDF to structured, validated data, with human oversight built in. This isn't automation replacing people. It's AI doing the heavy lifting and humans making the final call. Architecture at a Glance POST /api/idp/start → Step 1 PDF Extraction (Azure Document Intelligence → Markdown) → Step 2 Classification (DSPy ChainOfThought) → Step 3 Data Extraction (Azure Content Understanding + DSPy LLM, in parallel) → Step 4 Comparison (field-by-field diff) → Step 5 Human Review (HITL gate — approve / reject / edit) → Step 6 AI Reasoning Agent (validation, consolidation, recommendations) → Final structured result The backend is Azure Durable Functions (Python) on Flex Consumption — customers only pay for what they use, and it scales automatically. The frontend is a Next.js dashboard with SignalR real-time updates and a Reaflow workflow visualization. Every step broadcasts stepStarted → stepCompleted / stepFailed events so the UI updates as work progresses. The pattern applies wherever organisations receive high volumes of unstructured documents that need to be classified, data-extracted, validated, and approved. The Six Steps, Explained Step 1: PDF → Markdown We use Azure Document Intelligence with the prebuilt-layout model to convert uploaded PDFs into structured Markdown — preserving tables, headings, and reading order. Markdown turns out to be a much better intermediate representation for LLMs than raw text or HTML. class PDFMarkdownExtractor: async def extract(self, pdf_path: str) -> tuple[PDFContent, Step01Output]: poller = self.client.begin_analyze_document( "prebuilt-layout", analyze_request=AnalyzeDocumentRequest(url_source=pdf_path), output_content_format=DocumentContentFormat.MARKDOWN, ) result: AnalyzeResult = poller.result() # Split into per-page Markdown chunks... Output: Per-page Markdown content, total page count, and character stats. Step 2: Document Classification (DSPy) Rather than hard-coding classification rules, we use DSPy with ChainOfThought prompting. DSPy lets us define classification as a signature — a declarative input/output contract — and the framework handles prompt optimization. class DocumentClassificationSignature(dspy.Signature): """Classify document page into predefined categories.""" page_content: str = dspy.InputField(desc="Markdown content of the document page") available_categories: str = dspy.InputField(desc="Available categories") classification: DocumentClassificationOutput = dspy.OutputField() Categories are loaded from a domain-specific classification_categories.json. Adding new categories means editing a JSON file, not code. Critically, classification is per-page, not per-document. A multi-page loan application might contain a loan form on page 1, income verification on page 2, and a property valuation on page 3 — each classified independently with its own confidence score and detected field indicators. This means multi-section documents are handled correctly downstream. Why DSPy? It gives us structured, typed outputs via Pydantic models, automatic prompt optimization, and clean separation between the what (signature) and the how (ChainOfThought, Predict, etc.). Step 3: Dual-Model Extraction (Run in Parallel) This is where things get interesting. We run two independent extractors in parallel: Azure Content Understanding (CU): A specialized Azure service that takes the raw PDF and applies a domain-specific schema to extract structured fields. DSPy LLM Extractor: Uses the Markdown from Step 1 with a dynamically generated Pydantic model (built from the domain's extraction_schema.json) to extract the same fields via an LLM. The LLM provider is selectable at runtime — Azure OpenAI, Claude, or open-weight models deployed on Azure (Qwen, DeepSeek, Llama, Phi, and more from the Azure AI Model Catalog). # In the orchestrator — fire both tasks at once azure_task = context.call_activity("activity_step_03_01_azure_extraction", input) dspy_task = context.call_activity("activity_step_03_02_dspy_extraction", input) results = yield context.task_all([azure_task, dspy_task]) Both extractors use the same domain-specific schema but approach the problem differently. Running two models gives us a natural cross-check: if both extractors agree on a field value, confidence is high. If they disagree, we know exactly where to focus human attention — not the entire document, just the specific fields that need it. Multi-Provider LLM Support The DSPy extraction and classification steps aren't locked to a single model provider. From the dashboard, users can choose between: Azure OpenAI in Foundry Models — GPT-4.1, o3-mini (default) Claude on Azure — Anthropic's Claude models Foundry Models — Open-weight models deployed on Azure via Foundry Models: Qwen 2.5 72B, DeepSeek V3/R1, Llama 3.3 70B, Phi-4, and more The third option is key: instead of routing through a third-party service, you deploy open-weight models directly on Azure as serverless API endpoints through Azure AI Foundry. These endpoints expose an OpenAI-compatible API, so DSPy talks to them the same way it talks to GPT-4.1 — just with a different api_base. You get the model diversity of the open-weight ecosystem with Azure's enterprise security, compliance, and network isolation. A factory pattern in the backend resolves the selected provider and model at runtime, so switching from Azure OpenAI to Qwen on Azure AI is a single dropdown change — no config edits, no redeployment. This makes it easy to benchmark different models against the same extraction schema and compare quality. Step 4: Field-by-Field Comparison The comparator aligns the outputs of both extractors and produces a diff report: matching fields, mismatches, fields found by only one extractor, and a calculated match percentage. This feeds directly into the human review step. Output: "Match: 87.5% (14/16 fields)" Step 5: Human-in-the-Loop (HITL) Gate The pipeline pauses and waits for a human decision. The Durable Functions orchestrator uses wait_for_external_event() with a configurable timeout (default: 24 hours) implemented as a timer race: review_event = context.wait_for_external_event(HITL_REVIEW_EVENT) timeout = context.create_timer( context.current_utc_datetime + timedelta(hours=HITL_TIMEOUT_HOURS) ) winner = yield context.task_any([review_event, timeout]) The frontend shows a side-by-side comparison panel where reviewers can see both values for each disputed field — pick Azure's value, the LLM's value, or type in a correction. They can add notes explaining their decision, then approve or reject. If nobody responds within the timeout, it auto-escalates (configurable behavior). The orchestrator doesn't poll. It doesn't check a queue. The moment the reviewer submits their decision, the pipeline resumes automatically — using Durable Functions' native external event pattern. Step 6: AI Reasoning Agent The final step uses an AI agent with tool-calling to perform structured validation, consolidate field values, and generate a confidence score. This isn't just a prompt — it's an agent backed by the Microsoft Agent Framework with purpose-built tools: validate_fields — runs domain-specific validation rules (data types, ranges, cross-field logic) consolidate_extractions — merges Azure CU + DSPy outputs using confidence-weighted selection generate_summary — produces a natural-language summary with recommendations The reasoning step can use standard models or reasoning-optimised models like o3 or o3-mini for higher-stakes validation. The agent streams its reasoning process to the frontend in real time — validation results, confidence scoring, and recommendations all appear as they're generated. Domain-Driven Design: Zero-Code Extensibility One of the most powerful design choices: adding a new document type requires zero code changes. Each domain is a folder under idp_workflow/domains/ with four JSON files: idp_workflow/domains/insurance_claims/ ├── config.json # Domain metadata, thresholds, settings ├── classification_categories.json # Page-level classification taxonomy ├── extraction_schema.json # Field definitions (used by both extractors) └── validation_rules.json # Business rules for the reasoning agent The extraction_schema.json is particularly interesting — it's consumed by both the Azure CU service (which builds an analyzer from it) and the DSPy extractor (which dynamically generates a Pydantic model at runtime): def create_extraction_model_from_schema(schema: dict) -> type[BaseModel]: """Dynamically create a Pydantic model from an extraction schema JSON.""" # Maps schema field definitions → Pydantic field annotations # Supports nested objects, arrays, enums, and optional fields We currently ship four domains out of the box: insurance claims, home loans, small business lending, and trade finance. See It In Action: Processing a Home Loan Application To make this concrete, here's what happens when you process a multi-page home loan PDF — personal details, financial tables, and mixed content. Upload & Extract. The document hits the dashboard and Step 1 kicks off. Azure Document Intelligence converts all pages to structured Markdown, preserving tables and layout. You can preview the Markdown right in the detail panel. Per-Page Classification. Step 2 classifies each page independently: Page 1 is a Loan Application Form, Page 2 is Income Verification, Page 3 is a Property Valuation. Each has its own confidence score and detected fields listed. Dual Extraction. Azure CU and the DSPy LLM extractor run simultaneously. You can watch both progress bars in the dashboard. Comparison. The system finds 16 fields total. 14 match between the two extractors. Two fields differ — the annual income figure and the loan term. Those are highlighted for review. Human Review. The reviewer sees both values side by side for each disputed field, picks the correct value (or types a correction), adds a note, and approves. The moment they submit, the pipeline resumes — no polling. AI Reasoning. The agent validates against home loan business rules: loan-to-value ratio, income-to-repayment ratio, document completeness. Validation results stream in real time. Final output: 92% confidence, 11 out of 12 validations passed. The AI flags a minor discrepancy in employment dates and recommends approval with a condition to verify employment tenure. Result: A document that would take 30–45 minutes of manual processing, handled in under 2 minutes — with complete traceability. Every step, every decision, timestamped in the event log. Real-Time Frontend with SignalR Every orchestration step broadcasts events through Azure SignalR Service, targeted to the specific user who started the workflow: def _broadcast(context, user_id, event, data): return context.call_activity("notify_user", { "user_id": user_id, "instance_id": context.instance_id, "event": event, "data": data, }) The frontend generates a session-scoped userId, passes it via the x-user-id header during SignalR negotiation, and receives only its own workflow events. No Pub/Sub subscriptions to manage. The Next.js frontend uses: Zustand + Immer for state management (4 stores: workflow, events, reasoning, UI) Reaflow for the animated pipeline visualization React Query for data fetching Tailwind CSS for styling The result is a dashboard where you can upload a document and watch each pipeline step execute in real time. Infrastructure: Production-Ready from Day One The entire stack deploys with a single command using Azure Developer CLI (azd): azd up What gets provisioned: Resource Purpose Azure Functions (Flex Consumption) Backend API + orchestration Azure Static Web App Next.js frontend Durable Task Scheduler Orchestration state management Storage Account Document blob storage Application Insights Monitoring and diagnostics Network Security Perimeter Storage network lockdown Infrastructure is defined in Bicep with: Parameterized configuration (memory, max instances, retention) RBAC role assignments via a consolidated loop Two-region deployment (Functions + SWA have different region availability) Network Security Perimeter deployed in Learning mode, switched to Enforced post-deploy Key Engineering Decisions Why Durable Functions? Orchestrating a multi-step pipeline with parallel execution, external event gates, timeouts, and retry logic is exactly what Durable Functions was designed for. The orchestrator is a Python generator function — each yield is a checkpoint that survives process restarts: def idp_workflow_orchestration(context: DurableOrchestrationContext): step1 = yield from _execute_step(context, ...) # PDF extraction step2 = yield from _execute_step(context, ...) # Classification results = yield context.task_all([azure_task, dspy_task]) # Parallel extraction # ... HITL gate, reasoning agent, etc. No external queue management. No state database. No workflow engine to operate. Why Dual Extraction? Running two independent models on the same document gives us: Cross-validation — agreement between models is a strong confidence signal Coverage — one model might extract fields the other misses Auditability — human reviewers can see both outputs side by side Graceful degradation — if one service is down, the other still produces results Why DSPy over Raw Prompts? DSPy provides: Typed I/O — Pydantic models as signatures, not string parsing Composability — ChainOfThought, Predict, ReAct are interchangeable modules Prompt optimization — once you have labeled examples, DSPy can auto-tune prompts LM scoping — with dspy.context(lm=self.lm): isolates model configuration per call Getting Started # Clone git clone https://github.com/lordlinus/idp-workflow.git cd idp-workflow # DTS Emulator (requires Docker) docker run -d -p 8080:8080 -p 8082:8082 \ -e DTS_TASK_HUB_NAMES=default,idpworkflow \ mcr.microsoft.com/dts/dts-emulator:latest # Backend python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt func start # Frontend (separate terminal) cd frontend && npm install && npm run dev You'll also need Azurite (local storage emulator) running, plus Azure OpenAI, Document Intelligence, Content Understanding, and SignalR Service endpoints configured in local.settings.json. See the Local Development Guide for the full setup. Who Is This For? If any of these sound familiar, IDP Workflow was built for you: "We're drowning in documents." — High-volume document intake with manual processing bottlenecks. "We tried OCR but it breaks on new formats." — Brittle extraction that fails when layouts change. "Compliance needs an audit trail for every decision." — Regulated industries where traceability is non-negotiable. This is an AI-powered document processing platform — not a point OCR tool — with human oversight, dual AI validation, and domain extensibility built in from day one. What's Next Prompt optimization — using DSPy's BootstrapFewShot with domain-specific training examples Batch processing — fan-out/fan-in orchestration for processing document queues Custom evaluators — automated quality scoring per domain Additional domains — community-contributed domain configurations Try It Out The project is fully open source: github.com/lordlinus/idp-workflow Deploy to your own Azure subscription with azd up, upload a PDF from the sample_documents/ folder, and watch the pipeline run. We'd love feedback, contributions, and new domain configurations. Open an issue or submit a PR!362Views0likes1CommentIntroducing Phi-4-Reasoning-Vision to Microsoft Foundry
Vision reasoning models unlock a critical capability for developers: the ability to move beyond passive perception toward systems that can understand, reason over, and act on visual information. Instead of treating images, diagrams, documents, or UI screens as unstructured inputs, vision reasoning models enable developers to build applications that can interpret visual structure, connect it with textual context, and perform multi-step reasoning to reach actionable conclusions. Today, we are excited to announce Phi-4-Reasoning-Vision-15B is available in Microsoft Foundry and Hugging Face. This model brings high‑fidelity vision to the reasoning‑focused Phi‑4 family, extending small language models (SLMs) beyond perception into structured, multi‑step visual reasoning for agents, analytical tools, and scientific workflows. What’s new? The Phi model family has advanced toward combining efficient visual understanding with strong reasoning in small language models. Earlier Phi‑4 models demonstrated reliable perception and grounding across images and text, while later iterations introduced structured reasoning to improve performance on complex tasks. Phi‑4‑reasoning-vision-15B brings these threads together, pairing high‑resolution visual perception with selective, task‑aware reasoning. As a result, the model can reason deeply when needed while remaining fast and efficient for perception‑focused scenarios—making it well suited for interactive, real‑world applications. Key capabilities Reasoning behavior is explicitly enabled via prompting: Developers can explicitly enable or disable reasoning to balance latency and accuracy at runtime. Optimized for vision reasoning and can be used for: diagram-based math, document, chart, and table understanding, GUI interpretations and grounding for agent scenarios to interpret screens and actions, Computer-use agent scenarios, and General image chat and answering questions Benchmarks The following results summarize Phi-4-reasoning-vision-15B performance across a set of established multimodal reasoning, mathematics, and computer use benchmarks. The following benchmarks are the result of internal evaluations. Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B – force no think Phi-4-mm-instruct Kimi-VL-A3B-Instruct gemma-3-12b-it Qwen3-VL-8B-Instruct-4K Qwen3-VL-8B-Instruct-32K Qwen3-VL-32B-Instruct-4K Qwen3-VL-32B-Instruct-32K AI2D _TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA _TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench 64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse _MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision _MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista _MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU _VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot _v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 1: Accuracy comparisons relative to popular open-weight, non-thinking models Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B - force thinking Kimi-VL-A3B-Thinking gemma-3-12b-it Qwen3-VL-8B-Thinking-4K Qwen3-VL-8B-Thinking-40K Qwen3-VL-32B-Thiking-4K Qwen3-VL-32B-Thinking-40K AI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA _TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench 64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse _MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision _MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista _MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU _VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot _v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 2: Accuracy comparisons relative to popular open-weight, thinking models All results were obtained using a consistent evaluation setup and prompts across models; numbers are provided for comparison and analysis rather than as leaderboard claims. For more information regarding benchmarks and evaluations, please read the technical paper on the Microsoft Research hub. Suggested use cases and applications Phi‑4‑Reasoning-Vision-15B supports applications that require both high‑fidelity visual perception and structured inference. Two representative scenarios include scientific and mathematical reasoning over visual inputs, and computer‑using agents (CUAs) that operate directly on graphical user interfaces. In both cases, the model provides grounded visual understanding paired with controllable, low‑latency reasoning suitable for interactive systems. Computer use agents in retail scenarios For computer use agents, Phi‑4‑Reasoning-Vision-15B provides the perception and grounding layer required to understand and act within live ecommerce interfaces. For example, in an online shopping experience, the model interprets screen content—products, prices, filters, promotions, buttons, and cart state—and produces grounded observations that agentic models like Fara-7B can use to select actions. Its compact size and low latency inference make it well suited for CUA workflows and agentic applications. Visual reasoning for education Another practical use of visual reasoning models is education. A developer could build a K‑12 tutoring app with Phi‑4‑Reasoning‑Vision‑15B where students upload photos of worksheets, charts, or diagrams to get guided help—not answers. The model can understand the visual content, identify where the student went wrong, and explain the correct steps clearly. Over time, the app can adapt by serving new examples matched to the student’s learning level, turning visual problem‑solving into a personalized learning experience. Microsoft Responsible AI principles At Microsoft, our mission to empower people and organizations remains constant—especially in the age of AI, where the potential for human achievement is greater than ever. We recognize that trust is foundational to AI adoption, and earning that trust requires a commitment to transparency, safety, and accountability. As with other Phi models, Phi-4-Reasoning-Vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. These safety focused training signals help the model recognize and decline requests that fall outside intended or acceptable use. Additional details on the model’s safety considerations, evaluation approach, and known limitations are provided in the accompanying technical blog and model card. Getting started Start using Phi‑4‑Reasoning-Vision-15B in Microsoft Foundry today. Microsoft Foundry provides a unified environment for model discovery, evaluation, and deployment, making it straightforward to move from initial experimentation to production use while applying appropriate safety and governance practices. Deploy the new model on Microsoft Foundry. Learn more about the Phi family on Foundry Labs and in the Phi Cookbook Connect to the Microsoft Developer Community on Discord Read the technical paper on Microsoft Research Read more use cases on the Educators Developer blog851Views0likes0CommentsMicrosoft Foundry: An End-to-End Platform for Building, Governing, and Scaling AI
Microsoft Foundry: What It Is and How to Get Started As organizations accelerate their adoption of AI, one challenge consistently emerges: how to move from experimentation to production at scale in a secure, responsible, and efficient way. Microsoft Foundry exists to address exactly that challenge. What Is Microsoft Foundry? Microsoft Foundry is an end-to-end platform experience that brings together Microsoft’s AI development, deployment, and governance capabilities into a unified environment. It enables developers, data scientists, and enterprises to build, customize, deploy, and operate AI solutions, including generative AI, using Microsoft and partner models, tools, and services. Rather than being a single product, Foundry is a curated and integrated experience that spans model access, tooling, orchestration, evaluation, and enterprise-grade controls. Why Microsoft Foundry Exists AI innovation is moving fast, but enterprise adoption requires more than just access to models. Customers need: A consistent way to work with multiple models, including Microsoft, OpenAI, and open-source models Built-in security, compliance, and responsible AI capabilities Tooling that supports the full AI lifecycle, not just prototyping Seamless integration with existing data platforms, applications, and cloud operations Microsoft Foundry was created to reduce friction between experimentation and real-world deployment while aligning AI development with enterprise standards, governance, and scale. What’s Included in Microsoft Foundry? Microsoft Foundry brings together several key capabilities: 1. Model Choice and Flexibility Access to leading foundation models, including Azure OpenAI models and selected open-source models Ability to evaluate and select models based on performance, cost, and use case 2. AI Development and Orchestration Tools Prompt engineering, fine-tuning, and grounding with enterprise data Tools for building copilots, chat experiences, and AI-powered applications Orchestration across tools, APIs, and workflows 3. Evaluation, Safety, and Responsible AI Built-in evaluation for quality, latency, and cost Content safety, monitoring, and governance controls Alignment with Microsoft’s Responsible AI principles 4. Enterprise-Grade Platform Integration Native integration with Azure, Microsoft Fabric, Power Platform, and developer tools Identity, security, and compliance through Microsoft Entra and Azure controls Observability and lifecycle management for production workloads How to Get Started Getting started with Microsoft Foundry is straightforward: Start in Azure - Use Azure as the control plane to access Foundry experiences and services. Explore models and tools - Experiment with available models, build prompts, and prototype AI workflows. Ground with your data - Connect enterprise data securely to create more relevant and contextual AI experiences. Evaluate and deploy - Use built-in evaluation and safety tools, then deploy AI solutions into production with confidence. Scale responsibly - Apply governance, monitoring, and cost controls as adoption grows. Final Thoughts Microsoft Foundry represents Microsoft’s vision for enterprise AI done right. It offers flexible model choice, strong development tooling, and built-in trust. By unifying AI development and operations into a single experience, Foundry helps organizations move faster while staying secure, compliant, and future-ready. Whether you are just starting with generative AI or scaling existing solutions, Microsoft Foundry provides a practical foundation to build on.717Views0likes0CommentsAnnouncing extended support for Fine Tuning gpt-4o and gpt-4o-mini
At Build 2025, we announced post-retirement, extended deployment and inference support for fine tuned models. Today, we’re excited to announce we’re extending fine-tuning training for current customers of our most popular Azure OpenAI models: gpt-4o (2024-08-06) and gpt-4o-mini (2024-07-18). Hundreds of customers have pushed trillions of tokens through fine-tuned versions of these models and we’re happy to provide even more runway for your AI agents and applications. Already using these models in Foundry? We have you covered as the only provider of fine tuning gpt-4o and gpt-4o-mini come April. Keep fine tuning! Not yet using Microsoft Foundry? Get started today by migrating your training data to Microsoft Foundry and fine tune using Global or Standard Training for gpt-4o and gpt-4o-mini using your existing OpenAI code. You’ll have the runway to continuously fine tune or update your models. You have until March 31 st , 2026, to become a fine-tuning customer of these models. Model Version Training retirement date Deployment retirement date gpt-4o 2024-08-06 No earlier than 2026-09-31 1 2027-03-31 gpt-4o-mini 2024-07-18 No earlier than 2026-09-31 1 2027-03-31 gpt-4.1 2025-04-14 At base model retirement One year after training retirement gpt-4.1-mini 2025-04-14 At base model retirement One year after training retirement gpt-4.1-nano 2025-04-14 At base model retirement One year after training retirement o4-mini 2025-04-16 At base model retirement One year after training retirement 1 For existing customers only. Otherwise, training retirement occurs at base model retirement301Views0likes0CommentsNew Azure Open AI models bring fast, expressive, and real‑time AI experiences in Microsoft Foundry
Modern AI applications, whether voice‑first experiences or building large software systems, rarely fit into a single prompt. Real work unfolds over time: maintaining context, following instructions, invoking tools, and adapting as requirements evolve. When these foundations break down through latency spikes, instruction drift, or unreliable tool calls, both user conversations and developer workflows are impacted. OpenAI’s latest models address this shared challenge by prioritizing continuity and reliability across real‑time interaction and long‑running engineering tasks. Starting today, GPT-Realtime-1.5, GPT-Audio-1.5, and GPT-5.3-Codex are rolling out into Microsoft Foundry. Together, these models reflect the growing needs of the modern developer and push the needle from short, stateless interactions toward AI systems that can reason, act, and collaborate over time. GPT-5.3-Codex at a glance GPT‑5.3‑Codex brings together advanced coding capability with broader reasoning and professional problem solving in a single model built for real engineering work. It unifies the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT5.2 in one system. This shifts the experience from optimizing isolated outputs to supporting longer running development efforts; where repositories are large, changes span multiple steps, and requirements aren’t always fully specified at the start. What’s improved Model experiences 25% faster execution time, according to Open AI, than its predecessors so developers can accelerate development of new applications. Built for long-running tasks that involve research, tool use, and complex, multi‑step execution while maintaining context. Midtask steerability and frequent updates allow developers to redirect and collaborate with the model as it works without losing context. Stronger computer-use capabilities allow developers to execute across the full spectrum of technical work. Common use cases Developers and teams can apply GPT‑5.3‑Codex across a wide range of scenarios, including: Refactoring and modernizing large or legacy applications Performing multi‑step migrations or upgrades Running agentic developer workflows that span analysis, implementation, testing, and remediation Automating code reviews, test generation, and defect detection Supporting development in security‑sensitive or regulated environments Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.3-Codex $1.75 $0.175 $14.00 GPT-Realtime-1.5 and GPT-Audio-1.5 at a glance The models deliver measurable gains in reasoning and speech understanding for real‑time voice interactions on Microsoft Foundry. In OpenAI’s evaluations, it shows a +5% lift on Big Bench Audio (reasoning), a +10.23% improvement in alphanumeric transcription, and a +7% gain in instruction following, while maintaining low‑latency performance. Key improvements include: What's improved More natural‑sounding speech: Audio output is smoother and more conversational, with improved pacing and prosody. Higher audio quality: Clearer, more consistent audio output across supported voices. Improved instruction following: Better alignment with developer‑provided system and user instructions during live interactions. Function calling support: Enables structured, tool‑driven interactions within real‑time audio flows. Common use cases Developers are using GPT-Realtime-1.5 and GPT-Audio-1.5 for scenarios where low‑latency voice interaction is essential, including: Conversational voice agents for customer support or internal help desks Voice‑enabled assistants embedded in applications or devices Live voice interfaces for kiosks, demos, and interactive experiences Hands‑free workflows where audio input and output replace keyboard interaction Pricing Model Text Audio Image Input Cached Input Output Input Cached Input Output Input Cached Input Output GPT-Realtime-1.5 $4.00 $0.04 $16.0 $32.0 $0.40 $64.00 $4.00 $0.04 $16.0 GPT-Audio-1.5 $2.50 n/a $10.0 $32.00 n/a $64.00 $2.50 n/a $10.0 Getting started in Microsoft Foundry Start building in Microsoft Foundry, evaluate performance, and explore Azure Open AI models today. Foundry brings evaluation, deployment, and governance into a single workflow, helping teams progress from experiments to scalable applications while maintaining security and operational controls.8.9KViews1like0CommentsBuilding Knowledge-Grounded Conversational AI Agents with Azure Speech Photo Avatars
From Chat to Presence: The Next Step in Conversational AI Chat agents are now embedded across nearly every industry, from customer support on websites to direct integrations inside business applications designed to boost efficiency and productivity. As these agents become more capable and more visible, user expectations are also rising: conversations should feel natural, trustworthy, and engaging. While text‑only chat agents work well for many scenarios, voice‑enabled agents take a meaningful step forward by introducing a clearer persona and a stronger sense of presence, making interactions feel more human and intuitive (see healow Genie success story). In domains such as Retail, Healthcare, Education, and Corporate Training, adding a visual dimension through AI avatars further elevates the experience. Pairing voice with a lifelike visual representation improves inclusiveness, reduces interaction friction, and helps users better contextualize conversations—especially in scenarios that rely on trust, guidance, or repeated engagement. To support these experiences, Microsoft offers two AI avatar options through Azure Speech: Video Avatars, which are generally available and provide full‑ or partial‑body immersive representations, and Photo Avatars, currently in public preview, which deliver a headshot‑style visual well suited for web‑based agents and digital twin scenarios. Both options support custom avatars, enabling organizations to reflect their brand identity rather than relying solely on generic representations (see W2M custom video avatar). Choosing between Video Avatars and Photo Avatars is less about preference and more about intent. Video Avatars offer higher visual fidelity and immersion but require more extensive onboarding, such as high-quality recorded video of an avatar talent. Photo Avatars, by contrast, can be created from a single image, enabling a lighter‑weight onboarding process while still delivering a human‑centered experience. The right choice depends on the desired interaction style, visual presence, and target deployment scenario. What this solution demonstrates In this post, I walk through how to integrate Azure Speech Photo Avatars — powered by Microsoft Research's VASA-1 model — into a knowledge‑grounded conversational AI agent built on Azure AI Search. The goal is to show how voice, visuals, and retrieval‑augmented generation (RAG) can come together to create a more natural and engaging agent experience. The solution exposes a web‑based interface where users can speak naturally to the AI agent using their voice. The agent responds in real time using synthesized speech, while live transcriptions of the conversation are displayed in the UI to improve clarity and accessibility. To help compare different interaction patterns, the sample application supports three modes: 1) Photo Avatar mode, which adds a lifelike visual presence. 2) Video Avatar mode, which provides a more immersive, full‑motion experience. 3) Voice‑only mode, which focuses purely on speech‑to‑speech interaction. Key architectural components An end‑to‑end architecture for the solution is shown in the diagram below. The solution is composed of the following core services and building blocks: Microsoft Foundry — provides the platform for deploying, managing, and accessing the foundation models used by the application. Azure OpenAI — provides the Realtime API for speech‑to‑speech interaction in the voice‑only mode and the Chat Completions API used by backend services for reasoning and conversational responses. gpt‑4.1 — LLM used for reasoning tasks such as deciding when to invoke tool calls and summarizing responses. gpt-realtime-mini — LLM used for speech-to-speech interaction in the Voice-only mode. text‑embedding‑3‑large — LLM used for generating vector embeddings used in retrieval‑augmented generation. Azure Speech — delivers the real‑time speech‑to‑text (STT), text‑to‑speech (TTS), and AI avatars capabilities for both Photo Avatar and Video Avatar experiences. Azure Document Intelligence — extracts structured text, layout, and key information from source documents used to build the knowledge base. Azure AI Search — provides vector‑based retrieval to ground the language model with relevant, context‑aware content. Azure Container Apps — hosts the web UI frontend, backend services, and MCP server within a managed container runtime. Azure Container Apps Environment — defines a secure and isolated boundary for networking, scaling, and observability of the containerized workloads. Azure Container Registry — stores and manages Docker images used by the container applications. How you can try it yourself The complete sample implementation is available in the LiveChat AI Voice Assistant repository, which includes instructions for deploying the solution into your Azure environment. The repository uses Infrastructure as Code (IaC) deployment via Azure Developer CLI (azd) to orchestrate Azure resource provisioning and application deployment. Prerequisites: An Azure subscription with appropriate services and models' quota is required to deploy the solution. Getting the solution up and running in just three simple steps: Clone the repository and navigate to the project git clone https://github.com/mardianto-msft/azure-speech-ai-avatars.git cd azure-speech-ai-avatars Authenticate with Azure azd auth login Initialize and deploy the solution azd up Once deployed, you can access the sample application by opening the frontend service URL in a web browser. To demonstrate knowledge grounding, the sample includes source documents derived from Microsoft’s 2025 Annual Report and Shareholder Letter. These grounding documents can optionally be replaced with your own data, allowing the same architecture to be reused for domain‑specific or enterprise scenarios. When using the provided sample documents, you can ask questions such as: “How much was Microsoft’s net income in 2025?”, “What are Microsoft’s priorities according to the shareholder letter?”, “Who is Microsoft’s CEO?” Bringing Conversational AI Agents to Life This implementation of Azure Speech Photo Avatars serves as a practical starting point for building more engaging, knowledge‑grounded conversational AI agents. By combining voice interaction, visual presence, and retrieval‑augmented generation, Photo Avatars offer a lightweight yet powerful way to make AI agents feel more approachable, trustworthy, and human‑centered — especially in web‑based and enterprise scenarios. From here, the solution can be extended over time with capabilities such as long‑term memory, richer personalization, or more advanced multi‑agent orchestration. Whether used as a reference architecture or as the foundation for a production system, this approach demonstrates how Azure Speech Photo Avatars can help bridge the gap between conversational intelligence and meaningful user experience. By emphasizing accessibility, trust, and human‑centered design, it reflects Microsoft’s broader mission to empower every person and every organization on the planet to achieve more.386Views0likes0CommentsWhat’s trending on Hugging Face: PubMedBERT Base Embeddings, Paraphrase Multilingual MiniLM, BGE-M3
The embedding model landscape has evolved beyond one-size-fits-all solutions. Today’s developers navigate a set of deliberate trade‑offs: domain specialization to improve accuracy in vertical applications, multilingual capabilities to support global use cases, and retrieval strategies that optimize performance at scale. Once a model demonstrates strong semantic performance, predictable behavior, and broad community support, it often becomes a trusted reference baseline that developers build around and deploy with confidence. This week, we’re not spotlighting models that are new to Microsoft Foundry. Instead, we’re turning our attention to models that have managed to stay relevant in a rapidly expanding sea of options. This week's Model Monday's edition highlights three Hugging Face models including NeuML's PubMedBERT Base Embeddings for domain-specific medical text understanding, Sentence Transformers' Paraphrase Multilingual MiniLM for lightweight cross-lingual semantic similarity, and BAAI's BGE-M3 for multi-functional long-context retrieval across 100+ languages. Models of the week NeuML: PubMedBERT Base Embeddings Model Specs Parameters / size: 109M Context length: 512 tokens Primary task: Embeddings (medical domain) Why it's interesting Domain-specific performance gains: Fine-tuned on PubMed title-abstract pairs, achieving 95.62% average Pearson correlation across medical benchmarks—outperforming general-purpose models like gte-base (95.37%), bge-base-en-v1.5 (93.78%), and all-MiniLM-L6-v2 (93.46%) on medical literature tasks Production-validated for medical RAG: With 141K downloads and deployment in 30+ medical AI applications, this model demonstrates consistent real-world performance for clinical research, drug discovery, and biomedical semantic search pipelines Built on Microsoft's BiomedNLP foundation: Extends BioMed BERT family with sentence-transformers mean pooling, creating 768-dimensional embeddings optimized for medical literature clustering and retrieval Try it Clinical research sample prompt: Industry specific sample prompt: You're building a clinical decision support system for oncology. Deploy PubMedBERT Base Embeddings in Microsoft Foundry to index 50,000 recent cancer research abstracts from PubMed. A physician queries: "What are the cardiotoxicity risks of combining checkpoint inhibitors with anthracycline chemotherapy in elderly patients?" Embed the query, retrieve the top 10 most semantically similar abstracts using cosine similarity, and return citations with PubMed IDs for evidence-based treatment planning. Sentence Transformers: Paraphrase Multilingual MiniLM L12 v2 Model Specs Parameters / size: 117M Context length: 128 tokens Primary task: Embeddings (multilingual, sentence similarity) Why it's interesting Multilingual adoption: Supports 50+ languages including Arabic, Chinese, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Vietnamese—with 18.4 million downloads last month demonstrating production-scale validation across global deployments Compact architecture for edge deployment: At 117M parameters producing 384-dimensional embeddings, this model balances multilingual coverage with inference efficiency, making it ideal for resource-constrained environments or high-throughput applications Sentence-BERT foundation: Based on the influential Sentence-BERT paper (Reimers & Gurevych, 2019), using siamese BERT networks with mean pooling to create semantically meaningful sentence embeddings for clustering, paraphrase detection, and cross-lingual search Community-proven versatility: With 299 fine-tuned variants and 100+ Spaces implementations, this model serves as a peer reviewed starting point for multilingual semantic similarity tasks, from customer support ticket routing to cross-lingual document retrieval Try it E-commerce sample prompt: You're building a global customer support platform for an e-commerce company operating in 30 countries. Deploy Paraphrase Multilingual MiniLM in Microsoft Foundry to process incoming support tickets in English, Spanish, French, German, Portuguese, Japanese, and Korean. Embed each ticket as a 384-dimensional vector and cluster by semantic similarity to automatically route issues to specialized teams (payment, shipping, returns, technical). Flag duplicate tickets with cosine similarity > 0.85 to prevent redundant responses. BAAI: BGE-M3 Model Specs Parameters / size: ~560M Context length: 8192 tokens Primary task: Embeddings (multi-functional: dense, sparse, multi-vector) Why it's interesting Three retrieval modes in one model: Uniquely supports dense retrieval (1024-dim embeddings), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching)—enabling hybrid search pipelines without maintaining separate models or indexes Exceptional long-context capability: 8192-token context window handles full documents, legal contracts, research papers, and lengthy technical content—validated on MLDR (13-language document retrieval) and NarrativeQA (long-form question answering) benchmarks Multilingual dominance: Outperforms OpenAI embeddings on MIRACL multilingual retrieval across 13+ languages and demonstrates strong zero-shot cross-lingual transfer on MKQA. Try it Legal document search sample prompt: You're building a legal document search system for a multinational law firm. Deploy BGE-M3 in Microsoft Foundry to index 5,000 full-length commercial contracts (average 6,000 tokens each) in English, French, German, and Spanish. A lawyer queries: "Find all force majeure clauses that exclude liability for pandemics or global health emergencies." Use hybrid retrieval: (1) dense embeddings for semantic similarity to capture concept variations like "Act of God" or "unforeseen circumstances", (2) sparse retrieval for exact keyword matches on "force majeure", "pandemic", "health emergency". Combine scores with weighted sum (0.6 dense + 0.4 sparse) and return top 15 contract sections with clause numbers and jurisdiction metadata. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry302Views0likes0CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry806Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Open‑source AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a production‑ready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Monday’s series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This week’s Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Meta’s unified foundation model for image and video segmentation, and MiniMax’s latest open-source agentic model optimized for complex workflows. Models of the week Z.AI’s GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why it’s interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑context agent workflows (local or low‑cost autonomous agents) Call out long‑horizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLM‑4.7‑Flash works best when you give it a clear goal and let it reason through a bounded task, here’s an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a mid‑scale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edge‑case failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize low‑frequency, high‑impact risks that standard testing misses. Recommend minimal, low‑cost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why it’s interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Best‑fit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation What’s notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat SAM 3 as a concept detector, not an interactive click tool. Use short, concrete noun‑phrase concept prompts instead of describing the scene or asking questions. Example prompt: “yellow school bus” or “shipping containers”. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: “person wearing a red jersey”. Hard‑to‑name or visually subtle objects Use exemplar‑based prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid over‑constraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAM 3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixel‑accurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frame‑level masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why it’s interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to one”. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Best‑practice prompt pattern End‑to‑end agentic coding (multi‑file edits, run‑fix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑horizon tool‑using agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1’s interleaved thinking and improved instruction‑constraint handling are designed for complex, multi‑step analytical tasks that require evidence tracking and coherent synthesis, not conversational back‑and‑forth. Long‑context reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMax‑M2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMax‑M2.1 is designed to act as a long‑horizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of information—here’s a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry1.1KViews0likes0Comments