foundry tools
14 TopicsBuilding Production-Ready, Secure, Observable, AI Agents with Real-Time Voice with Microsoft Foundry
We're excited to announce the general availability of Foundry Agent Service, Observability in Foundry Control Plane, and the Microsoft Foundry portal — plus Voice Live integration with Agent Service in public preview — giving teams a production-ready platform to build, deploy, and operate intelligent AI agents with enterprise-grade security and observability.5.1KViews2likes0CommentsGenerally Available: Evaluations, Monitoring, and Tracing in Microsoft Foundry
If you've shipped an AI agent to production, you've likely run into the same uncomfortable realization: the hard part isn't getting the agent to work - it's keeping it working. Models get updated, prompts get tweaked, retrieval pipelines drift, and user traffic surfaces edge cases that never appeared in your eval suite. Quality isn't something you establish once. It's something you have to continuously measure. Today, we're making that continuous measurement a first-class operational capability. Evaluations, Monitoring, and Tracing in Microsoft Foundry are now generally available through Foundry Control Plane. These aren't standalone tools bolted onto the side of the platform - they're deeply integrated with Azure Monitor, which means AI agent observability now lives in the same operational plane as the rest of your infrastructure. The Problem With Point-in-Time Evaluation Most evaluation workflows are designed around a pre-deployment gate. You build a test dataset, run your evals, review the scores, and ship. That approach has real value - but it has a hard ceiling. In production, agent behavior is a function of many things that change independently of your code: Foundation model updates ship continuously and can shift output style, reasoning patterns, and edge case handling in ways that don't always surface on your benchmark set. Prompt changes can have nonlinear effects downstream, especially in multi-step agentic flows. Retrieval pipeline drift changes what context your agent actually sees at inference time. A document index fresh last month may have stale or subtly different content today. Real-world traffic distribution is never exactly what you sampled for your test set. Production surfaces long-tail inputs that feel obvious in hindsight but were invisible during development. The implication is straightforward: evaluation has to be continuous, not episodic. You need quality signals at development time, at every CI/CD commit, and continuously against live production traffic - all using the same evaluator definitions so results are comparable across environments. That's the core design principle behind Foundry Observability. Continuous Evaluation Across the Full AI Lifecycle Built-In Evaluators Foundry's built-in evaluators cover the most critical quality and safety dimensions for production agent systems: Coherence and Relevance measure whether responses are internally consistent and on-topic relative to the input. These are table-stakes signals for any conversational or task-completion agent. Groundedness is particularly important for RAG-based architectures. It measures whether the model's output is actually supported by the retrieved context - as opposed to plausible-sounding content the model generated from its parametric memory. Groundedness failures are a leading indicator of hallucination risk in production, and they're often invisible to human reviewers at scale. Retrieval Quality evaluates the retrieval step independently from generation. Groundedness failures can originate in two places: the model may be ignoring good context, or the retrieval pipeline may not be surfacing relevant context in the first place. Splitting these signals makes it much easier to pinpoint root cause. Safety and Policy Alignment evaluates whether outputs meet your deployment's policy requirements - content safety, topic restrictions, response format compliance, and similar constraints. These evaluators are designed to run at every stage of the AI lifecycle: Local development - run evals inline as you iterate on prompts, retrieval config, or orchestration logic CI/CD pipelines - gate every commit against your quality baselines; catch regressions before they reach production Production traffic monitoring - continuously evaluate sampled live traffic and surface trends over time Because the evaluators are identical across all three contexts, a score in CI means the same thing as a score in production monitoring. See the Practical Guide to Evaluations and the Built-in Evaluators Reference for a deeper walkthrough. Custom Evaluators - Encoding Your Own Definition of Quality Built-in evaluators cover common signals well, but production agents often need to satisfy criteria specific to a domain, regulatory environment, or internal standard. Foundry supports two types of custom evaluators (currently in public preview): LLM-as-a-Judge evaluators let you configure a prompt and grading rubric, then use a language model to apply that rubric to your agent's outputs. This is the right approach for quality dimensions that require reasoning or contextual judgment - whether a response appropriately acknowledges uncertainty, whether a customer-facing message matches your brand tone, or whether a clinical summary meets documentation standards. You write a judge prompt with a scoring scale (e.g., 1–5 with criteria for each level) that evaluates a given {input} / {response} pair. Foundry runs this at scale and aggregates scores into your dashboards alongside built-in results. Code-based evaluators are Python functions that implement any evaluation logic you can express programmatically - regex matching, schema validation, business rule checks, compliance assertions, or calls to external systems. If your organization has documented policies about what a valid agent response looks like, you can encode those policies directly into your evaluation pipeline. Custom and built-in evaluators compose naturally - running against the same traffic, producing results in the same schema, feeding into the same dashboards and alert rules. Monitoring and Alerting - AI Quality as an Operational Signal All observability data produced by Foundry - evaluation results, traces, latency, token usage, and quality metrics - is published directly to Azure Monitor. This is where the integration pays off for teams already on Azure. What this enables that siloed AI monitoring tools can't: Cross-stack correlation. When your groundedness score drops, is it a model update, a retrieval pipeline issue, or an infrastructure problem affecting latency? With AI quality signals and infrastructure telemetry in the same Azure Monitor Application Insights workspace, you can answer that in minutes rather than hours of manual correlation across disconnected systems. Unified alerting. Configure Azure Monitor alert rules on any evaluation metric - trigger a PagerDuty incident when groundedness drops below threshold, send a Teams notification when safety violations spike, or create automated runbook responses when retrieval quality degrades. These are the same alert mechanisms your SRE team already uses. Enterprise governance by default. Azure Monitor's RBAC, retention policies, diagnostic settings, and audit logging apply automatically to all AI observability data. You inherit the governance framework your organization has already built and approved. Grafana and existing dashboards. If your team uses Azure Managed Grafana, evaluation metrics can flow into existing dashboards alongside your other operational metrics - a single pane of glass for application health, infrastructure performance, and AI agent quality. The Agent Monitoring Dashboard in the Foundry portal provides an AI-native view out of the box - evaluation metric trends, safety threshold status, quality score distributions, and latency breakdowns. Everything in that dashboard is backed by Azure Monitor data, so SRE teams can always drill deeper. End-to-End Tracing: From Quality Signal to Root Cause A groundedness score tells you something is wrong. A trace tells you exactly where the failure occurred and what the agent actually did. Foundry provides OpenTelemetry-based distributed tracing that follows each request through your entire agent system: model calls, tool invocations, retrieval steps, orchestration logic, and cross-agent handoffs. Traces capture the full execution path - inputs, outputs, latency at each step, tool call parameters and responses, and token usage. The key design decision: evaluation results are linked directly to traces. When you see a low groundedness score in your monitoring dashboard, you navigate directly to the specific trace that produced it - no manual timestamp correlation, no separate trace ID lookup. The connection is made automatically. Foundry auto-collects traces across the frameworks your agents are likely already built on: Microsoft Agent Framework Semantic Kernel LangChain and LangGraph OpenAI Agents SDK For custom or less common orchestration frameworks, the Azure Monitor OpenTelemetry Distro provides an instrumentation path. Microsoft is also contributing upstream to the OpenTelemetry project - working with Cisco Outshift, we've contributed semantic conventions for multi-agent trace correlation, standardizing how agent identity, task context, and cross-agent handoffs are represented in OTel spans. Note: Tracing is currently in public preview, with GA shipping by end of March. Prompt Optimizer (Public Preview) One persistent friction point in agent development is the iteration loop between writing prompts and measuring their effect. You make a change, run your evals, look at the delta, try to infer what about the change mattered, and repeat. Prompt Optimizer tightens this loop. It analyzes your existing prompt and applies structured prompt engineering techniques - clarifying ambiguous instructions, improving formatting for model comprehension, restructuring few-shot examples, making implicit constraints explicit - with paragraph-level explanations for every change it makes. The transparency is deliberate. Rather than producing a black-box "optimized" prompt, it shows you exactly what it changed and why. You can add constraints, trigger another optimization pass, and iterate until satisfied. When you're done, apply it with one click. The value compounds alongside continuous evaluation: run your eval suite against the current prompt, optimize, run evals again, see the measured improvement. That feedback loop - optimize, measure, optimize - is the closest thing to a systematic approach to prompt engineering that currently exists. What Makes our Approach to Observability Different There are other evaluation and observability tools in the AI ecosystem. The differentiation in Foundry's approach comes down to specific architectural choices: Unified lifecycle coverage, not just pre-deployment testing. Most existing evaluation tools are designed for offline, pre-deployment use. Foundry's evaluators run in the same form at development time, in CI/CD, and against live production traffic. Your quality metrics are actually comparable across the lifecycle - you can tell whether production quality matches what you saw in testing, rather than operating two separate measurement systems that can't be compared. No separate observability silo. Publishing all observability data to Azure Monitor means you don't operate a separate system for AI quality alongside your existing infrastructure monitoring. AI incidents route through your existing on-call rotations. AI quality data is subject to the same retention and compliance controls as the rest of your telemetry. Framework-agnostic tracing. Auto-instrumentation across Semantic Kernel, LangChain, LangGraph, and the OpenAI Agents SDK means you're not locked into a specific orchestration framework. The OpenTelemetry foundation means trace data is portable to any compatible backend, protecting your investment as the tooling landscape evolves. Composable evaluators. Built-in and custom evaluators run in the same pipeline, against the same traffic, producing results in the same schema, feeding into the same dashboards and alert rules. You don't choose between generic coverage and domain-specific precision - you get both. Evaluation linked to traces. Most systems treat evaluation and tracing as separate concerns. Foundry treats them as two views of the same event - closing the loop between detecting a quality problem and diagnosing it. Getting Started If you're building agents on Microsoft Foundry, or using Semantic Kernel, LangChain, LangGraph, or the OpenAI Agents SDK and want to add production observability, the entry point is Foundry Control Plane. Try it You'll need a Foundry project with an agent and an Azure OpenAI deployment. Enable observability by navigating to Foundry Control Plane and connecting your Azure Monitor workspace. Then walk through the Practical Guide to Evaluations, explore the Built-in Evaluators Reference, and set up end-to-end tracing for your agents.2.3KViews1like0CommentsMicrosoft Foundry: An End-to-End Platform for Building, Governing, and Scaling AI
Microsoft Foundry: What It Is and How to Get Started As organizations accelerate their adoption of AI, one challenge consistently emerges: how to move from experimentation to production at scale in a secure, responsible, and efficient way. Microsoft Foundry exists to address exactly that challenge. What Is Microsoft Foundry? Microsoft Foundry is an end-to-end platform experience that brings together Microsoft’s AI development, deployment, and governance capabilities into a unified environment. It enables developers, data scientists, and enterprises to build, customize, deploy, and operate AI solutions, including generative AI, using Microsoft and partner models, tools, and services. Rather than being a single product, Foundry is a curated and integrated experience that spans model access, tooling, orchestration, evaluation, and enterprise-grade controls. Why Microsoft Foundry Exists AI innovation is moving fast, but enterprise adoption requires more than just access to models. Customers need: A consistent way to work with multiple models, including Microsoft, OpenAI, and open-source models Built-in security, compliance, and responsible AI capabilities Tooling that supports the full AI lifecycle, not just prototyping Seamless integration with existing data platforms, applications, and cloud operations Microsoft Foundry was created to reduce friction between experimentation and real-world deployment while aligning AI development with enterprise standards, governance, and scale. What’s Included in Microsoft Foundry? Microsoft Foundry brings together several key capabilities: 1. Model Choice and Flexibility Access to leading foundation models, including Azure OpenAI models and selected open-source models Ability to evaluate and select models based on performance, cost, and use case 2. AI Development and Orchestration Tools Prompt engineering, fine-tuning, and grounding with enterprise data Tools for building copilots, chat experiences, and AI-powered applications Orchestration across tools, APIs, and workflows 3. Evaluation, Safety, and Responsible AI Built-in evaluation for quality, latency, and cost Content safety, monitoring, and governance controls Alignment with Microsoft’s Responsible AI principles 4. Enterprise-Grade Platform Integration Native integration with Azure, Microsoft Fabric, Power Platform, and developer tools Identity, security, and compliance through Microsoft Entra and Azure controls Observability and lifecycle management for production workloads How to Get Started Getting started with Microsoft Foundry is straightforward: Start in Azure - Use Azure as the control plane to access Foundry experiences and services. Explore models and tools - Experiment with available models, build prompts, and prototype AI workflows. Ground with your data - Connect enterprise data securely to create more relevant and contextual AI experiences. Evaluate and deploy - Use built-in evaluation and safety tools, then deploy AI solutions into production with confidence. Scale responsibly - Apply governance, monitoring, and cost controls as adoption grows. Final Thoughts Microsoft Foundry represents Microsoft’s vision for enterprise AI done right. It offers flexible model choice, strong development tooling, and built-in trust. By unifying AI development and operations into a single experience, Foundry helps organizations move faster while staying secure, compliant, and future-ready. Whether you are just starting with generative AI or scaling existing solutions, Microsoft Foundry provides a practical foundation to build on.928Views1like0CommentsCybersecurity in the Age of Digital Acceleration: Securing Intelligence, Assets, and Trust
Over the past four decades, Information Technology has evolved from modest on-premise systems with limited storage to a boundless, cloud-driven ecosystem that powers global commerce, governance, defense, and daily life. What began in the mid-1980s as hardware-centric computing has transformed into an intelligent, distributed, always-on digital universe. Today, storage is virtually infinite. Processing is instantaneous. Markets operate 24/7. Transactions occur across continents in milliseconds. Physical boundaries have dissolved into digital connectivity. But in this era of extraordinary progress, one discipline has become indispensable: Cybersecurity. From Digitization to Intelligence The early waves of digital transformation converted manual processes into electronic systems—banking, records, communications, and trade. The second wave connected everything, linking enterprises, governments, devices, and supply chains into global digital ecosystems. We are now in the third wave: intelligent systems powered by artificial intelligence. AI is no longer a supporting tool; it is becoming a decision engine, shaping outcomes across financial markets, healthcare diagnostics, defense systems, logistics optimization, and enterprise automation. As intelligence increases, so does risk. Human intelligence built digital infrastructure; artificial intelligence now operates within it. Without responsible governance, AI systems can amplify bias, automate vulnerabilities, and accelerate systemic risk at unprecedented scale. Cybersecurity, therefore, is no longer just about protecting networks and systems. It is about protecting intelligence itself. From Intelligence to Orchestration: The Rise of AI Platforms As artificial intelligence matures, the challenge is no longer building models. It is operationalizing intelligence safely and at scale across complex enterprises. Organizations now run ecosystems of intelligence—multiple models, agents, data sources, and automated decisions spanning business units, geographies, and regulations. Managing this complexity requires more than tools; it requires orchestration. Microsoft Foundry marks this shift—from isolated AI capabilities to a governed, enterprise‑grade AI operating fabric. It is not about generating intelligence, but about controlling how intelligence is created, grounded, deployed, monitored, and trusted. Just as cloud platforms abstracted infrastructure complexity, AI platforms now abstract cognitive complexity—embedding security, governance, and accountability by design. Intelligence at Scale Requires Structure Unstructured intelligence introduces enterprise risk. Models drift without governance. Agents hallucinate without oversight. Poorly controlled data grounding exposes sensitive information. At scale, these failures are not theoretical—they are operational, financial, and reputational risks. As organizations embed AI into financial decisioning, customer engagement, supply chain optimization, healthcare diagnostics, and critical infrastructure, intelligence must operate within clear and enforceable guardrails. Reliability, security, and accountability are prerequisites for adoption at enterprise scale. Foundry provides a disciplined approach to enterprise AI. Intelligence is managed as production‑grade projects, not isolated experiments. Models are intentionally selected, benchmarked, and upgraded without disrupting live systems. Agents are empowered to act, but only within clearly defined permissions and policies. Enterprise knowledge remains grounded in trusted data, with identity, access controls, and compliance preserved end‑to‑end. Observability, evaluation, and auditability are built in by design—enabling leaders to understand, govern, and stand behind AI‑driven outcomes. This progression mirrors the evolution of cybersecurity itself: from fragmented, reactive controls to a unified, systemic architecture designed for scale, trust, and resilience. AI Agents: Automation with Accountability The next phase of AI is not conversational—it is agentic. Foundry introduces controlled autonomy: agents that are capable by design, but constrained by enforceable guardrails. These include identity boundaries, role‑based access control, data permissions, policy enforcement, and continuous monitoring. This applies a core cybersecurity principle directly to AI systems: least privilege, extended to intelligence itself. In this model, AI agents function as digital employees—highly capable and always on—but governed by the same trust, access, and accountability frameworks that secure human operators in production environments. The Evolution of Threats As technology advanced, threats evolved in parallel. Physical theft gave way to digital fraud, bank robberies became ransomware attacks, espionage shifted into data exfiltration, and counterfeiting transformed into identity theft. Crime adapted as systems digitized. Policing adapted in response. Ethical hacking, penetration testing, zero‑trust architectures, and advanced threat intelligence emerged to counter increasingly sophisticated adversaries. Cybersecurity evolved from static perimeter defense into predictive, AI‑driven protection models capable of identifying threats before exploitation occurs. The battlefield has now shifted decisively—from physical borders to cloud infrastructure. Digital Assets, Digital Wealth, Digital Risk Money itself has transformed. Physical currency evolved into digital banking, digital banking into real‑time payments, and cryptographic systems introduced decentralized finance. Today, tokenized assets and their underlying digital representations increasingly influence global markets. Platforms such as Foundry provide the resilient, scalable infrastructure required to support this shift—from financial services modernization to blockchain integration. As cryptocurrencies like Bitcoin and Ethereum redefine asset ownership and value exchange, economic systems are becoming dependent on cryptographic trust models rather than institutional intermediaries alone. Trade now happens at the tap of a screen. Assets reside in invisible vaults—cloud environments. Markets operate continuously, unconstrained by geography or time zones. Where wealth is digital, security must be digital. Where identity is virtual, trust must be algorithmic. And where assets are tokenized, integrity must be cryptographically enforced. Blockchain and National Security Blockchain technology introduces transparency, immutability, and distributed trust. Beyond cryptocurrencies, it is increasingly shaping critical domains such as cross‑border trade finance, defense supply‑chain traceability, secure digital identity frameworks, and smart contracts that enable automated compliance. For national economies and defense ecosystems, the convergence of AI and blockchain is powerful—but highly sensitive. A vulnerability in decentralized infrastructure can cascade globally, while a compromised AI model can influence economic or defense decisions at machine speed. Scale and autonomy magnify both impact and risk. Cybersecurity must therefore operate across three critical layers. Infrastructure security ensures cloud, network, and endpoint resilience. Data and identity protection enforce encryption, zero‑trust access, and secure authentication. AI governance and integrity safeguard models through adversarial defense, policy controls, and ethical AI compliance. Together, these layers form the foundation for securing intelligent, decentralized systems in an increasingly automated world. Responsible AI: Security Beyond Code As AI integrates into economic systems, financial markets, defense analytics, and public infrastructure, the responsibility associated with its deployment grows exponentially. Intelligence at scale amplifies both capability and consequence. Unmonitored AI systems can amplify misinformation, manipulate financial signals, expose sensitive defense intelligence, and automate systemic vulnerabilities. At machine speed, these failures propagate faster than traditional controls can respond. Responsible AI, therefore, is not merely an ethical aspiration—it is a cybersecurity mandate. Security must be embedded end‑to‑end, spanning data pipelines, training datasets, model validation, deployment environments, and continuous monitoring systems. AI governance is no longer a parallel concern. It is inseparable from modern cybersecurity architecture. Zero-Trust in a Borderless World Geographical boundaries no longer define risk exposure. Enterprises operate across jurisdictions, workforces are increasingly remote, and supply chains are fully digital. As a result, trust assumptions based on location or network perimeter no longer hold. The modern security model is zero trust: never assume, always verify. Every access request must be authenticated, every transaction validated, and every anomaly analyzed in real time—regardless of where it originates. Security is no longer reactive. It is predictive, adaptive, and continuously enforced across identity, data, and systems. The Economic Imperative The growth of digital currencies, tokenized commodities, and algorithm‑driven markets introduces both innovation and systemic complexity. Assets that were once physical or institutionally mediated—gold, securities, and identity—are now increasingly represented as digital, cryptographic constructs. Digital gold. Digital silver. Digital securities. Digital identity. Each reflects a broader shift: underlying economic value is now encoded, transferred, and settled through cryptographic systems rather than physical custody or manual processes. The integrity of these systems underpins economic stability itself. As a result, cybersecurity is no longer just an IT concern: it functions as an economic stabilizer, protecting trust, value, and market confidence in a fully digital financial world. The Road Ahead If the past four decades transformed hardware into intelligence, the decades ahead will transform intelligence into autonomy. Autonomous finance, logistics, defense systems, and AI agents will increasingly plan, decide, and act without continuous human intervention. The question is not whether this evolution will continue—it will. The question is whether security evolves faster than risk. In an autonomous world, cybersecurity must lead innovation, not follow it. In an era defined by AI, blockchain, digital currencies, and cloud‑native economies, security becomes the silent architecture of trust. Foundry represents one step in this evolution—where intelligence, security, and governance converge into a unified operational fabric. Without such foundations, digital transformation collapses under its own risk. With them, digital evolution becomes sustainable. Cybersecurity is no longer a protective layer. It is the foundation of the digital future.233Views1like0CommentsBuilding Knowledge-Grounded Conversational AI Agents with Azure Speech Photo Avatars
From Chat to Presence: The Next Step in Conversational AI Chat agents are now embedded across nearly every industry, from customer support on websites to direct integrations inside business applications designed to boost efficiency and productivity. As these agents become more capable and more visible, user expectations are also rising: conversations should feel natural, trustworthy, and engaging. While text‑only chat agents work well for many scenarios, voice‑enabled agents take a meaningful step forward by introducing a clearer persona and a stronger sense of presence, making interactions feel more human and intuitive (see healow Genie success story). In domains such as Retail, Healthcare, Education, and Corporate Training, adding a visual dimension through AI avatars further elevates the experience. Pairing voice with a lifelike visual representation improves inclusiveness, reduces interaction friction, and helps users better contextualize conversations—especially in scenarios that rely on trust, guidance, or repeated engagement. To support these experiences, Microsoft offers two AI avatar options through Azure Speech: Video Avatars, which are generally available and provide full‑ or partial‑body immersive representations, and Photo Avatars, currently in public preview, which deliver a headshot‑style visual well suited for web‑based agents and digital twin scenarios. Both options support custom avatars, enabling organizations to reflect their brand identity rather than relying solely on generic representations (see W2M custom video avatar). Choosing between Video Avatars and Photo Avatars is less about preference and more about intent. Video Avatars offer higher visual fidelity and immersion but require more extensive onboarding, such as high-quality recorded video of an avatar talent. Photo Avatars, by contrast, can be created from a single image, enabling a lighter‑weight onboarding process while still delivering a human‑centered experience. The right choice depends on the desired interaction style, visual presence, and target deployment scenario. What this solution demonstrates In this post, I walk through how to integrate Azure Speech Photo Avatars — powered by Microsoft Research's VASA-1 model — into a knowledge‑grounded conversational AI agent built on Azure AI Search. The goal is to show how voice, visuals, and retrieval‑augmented generation (RAG) can come together to create a more natural and engaging agent experience. The solution exposes a web‑based interface where users can speak naturally to the AI agent using their voice. The agent responds in real time using synthesized speech, while live transcriptions of the conversation are displayed in the UI to improve clarity and accessibility. To help compare different interaction patterns, the sample application supports three modes: 1) Photo Avatar mode, which adds a lifelike visual presence. 2) Video Avatar mode, which provides a more immersive, full‑motion experience. 3) Voice‑only mode, which focuses purely on speech‑to‑speech interaction. Key architectural components An end‑to‑end architecture for the solution is shown in the diagram below. The solution is composed of the following core services and building blocks: Microsoft Foundry — provides the platform for deploying, managing, and accessing the foundation models used by the application. Azure OpenAI — provides the Realtime API for speech‑to‑speech interaction in the voice‑only mode and the Chat Completions API used by backend services for reasoning and conversational responses. gpt‑4.1 — LLM used for reasoning tasks such as deciding when to invoke tool calls and summarizing responses. gpt-realtime-mini — LLM used for speech-to-speech interaction in the Voice-only mode. text‑embedding‑3‑large — LLM used for generating vector embeddings used in retrieval‑augmented generation. Azure Speech — delivers the real‑time speech‑to‑text (STT), text‑to‑speech (TTS), and AI avatars capabilities for both Photo Avatar and Video Avatar experiences. Azure Document Intelligence — extracts structured text, layout, and key information from source documents used to build the knowledge base. Azure AI Search — provides vector‑based retrieval to ground the language model with relevant, context‑aware content. Azure Container Apps — hosts the web UI frontend, backend services, and MCP server within a managed container runtime. Azure Container Apps Environment — defines a secure and isolated boundary for networking, scaling, and observability of the containerized workloads. Azure Container Registry — stores and manages Docker images used by the container applications. How you can try it yourself The complete sample implementation is available in the LiveChat AI Voice Assistant repository, which includes instructions for deploying the solution into your Azure environment. The repository uses Infrastructure as Code (IaC) deployment via Azure Developer CLI (azd) to orchestrate Azure resource provisioning and application deployment. Prerequisites: An Azure subscription with appropriate services and models' quota is required to deploy the solution. Getting the solution up and running in just three simple steps: Clone the repository and navigate to the project git clone https://github.com/mardianto-msft/azure-speech-ai-avatars.git cd azure-speech-ai-avatars Authenticate with Azure azd auth login Initialize and deploy the solution azd up Once deployed, you can access the sample application by opening the frontend service URL in a web browser. To demonstrate knowledge grounding, the sample includes source documents derived from Microsoft’s 2025 Annual Report and Shareholder Letter. These grounding documents can optionally be replaced with your own data, allowing the same architecture to be reused for domain‑specific or enterprise scenarios. When using the provided sample documents, you can ask questions such as: “How much was Microsoft’s net income in 2025?”, “What are Microsoft’s priorities according to the shareholder letter?”, “Who is Microsoft’s CEO?” Bringing Conversational AI Agents to Life This implementation of Azure Speech Photo Avatars serves as a practical starting point for building more engaging, knowledge‑grounded conversational AI agents. By combining voice interaction, visual presence, and retrieval‑augmented generation, Photo Avatars offer a lightweight yet powerful way to make AI agents feel more approachable, trustworthy, and human‑centered — especially in web‑based and enterprise scenarios. From here, the solution can be extended over time with capabilities such as long‑term memory, richer personalization, or more advanced multi‑agent orchestration. Whether used as a reference architecture or as the foundation for a production system, this approach demonstrates how Azure Speech Photo Avatars can help bridge the gap between conversational intelligence and meaningful user experience. By emphasizing accessibility, trust, and human‑centered design, it reflects Microsoft’s broader mission to empower every person and every organization on the planet to achieve more.433Views0likes0CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry861Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Open‑source AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a production‑ready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Monday’s series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This week’s Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Meta’s unified foundation model for image and video segmentation, and MiniMax’s latest open-source agentic model optimized for complex workflows. Models of the week Z.AI’s GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why it’s interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑context agent workflows (local or low‑cost autonomous agents) Call out long‑horizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLM‑4.7‑Flash works best when you give it a clear goal and let it reason through a bounded task, here’s an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a mid‑scale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edge‑case failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize low‑frequency, high‑impact risks that standard testing misses. Recommend minimal, low‑cost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why it’s interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Best‑fit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation What’s notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat SAM 3 as a concept detector, not an interactive click tool. Use short, concrete noun‑phrase concept prompts instead of describing the scene or asking questions. Example prompt: “yellow school bus” or “shipping containers”. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: “person wearing a red jersey”. Hard‑to‑name or visually subtle objects Use exemplar‑based prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid over‑constraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAM 3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixel‑accurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frame‑level masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why it’s interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to one”. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Best‑practice prompt pattern End‑to‑end agentic coding (multi‑file edits, run‑fix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑horizon tool‑using agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1’s interleaved thinking and improved instruction‑constraint handling are designed for complex, multi‑step analytical tasks that require evidence tracking and coherent synthesis, not conversational back‑and‑forth. Long‑context reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMax‑M2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMax‑M2.1 is designed to act as a long‑horizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of information—here’s a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry1.1KViews0likes0CommentsPublishing Agents from Microsoft Foundry to Microsoft 365 Copilot & Teams
Better Together is a series on how Microsoft’s AI platforms work seamlessly to build, deploy, and manage intelligent agents at enterprise scale. As organizations embrace AI across every workflow, Microsoft Foundry, Microsoft 365, Agent 365, and Microsoft Copilot Studio are coming together to deliver a unified approach—from development to deployment to day-to-day operations. This three-part series explores how these technologies connect to help enterprises build AI agents that are secure, governed, and deeply integrated with Microsoft’s product ecosystem. Series Overview Part 1: Publishing from Foundry to Microsoft 365 Copilot and Microsoft Teams Part 2: Foundry + Agent 365 — Native Integration for Enterprise AI Part 3: Microsoft Copilot Studio Integration with Foundry Agents This blog focuses on Part 1: Publishing from Foundry to Microsoft 365 Copilot—how developers can now publish agents built in Foundry directly to Microsoft 365 Copilot and Teams in just a few clicks. Build once. Publish everywhere. Developers can now take an AI agent built in Microsoft Foundry and publish it directly to Microsoft 365 Copilot and Microsoft Teams in just a few clicks. The new streamlined publishing flow eliminates manual setup across Entra ID, Azure Bot Service, and manifest files, turning hours of configuration into a seamless, guided flow in the Foundry Playground. Simplifying Agent Publishing for Microsoft 365 Copilot & Microsoft Teams Previously, deploying a Foundry AI agent into Microsoft 365 Copilot and Microsoft Teams required multiple steps: app registration, bot provisioning, manifest editing, and admin approval. With the new Foundry → M365 integration, the process is straightforward and intuitive. Key capabilities No-code publishing — Prepare, package, and publish agents directly from Foundry Playground. Unified build — A single agent package powers multiple Microsoft 365 channels, including Teams Chat, Microsoft 365 Copilot Chat, and BizChat. Agent-type agnostic — Works seamlessly whether you have a prompt agent, hosted agent, or workflow agent. Built-in Governance — Every agent published to your organization is automatically routed through Microsoft 365 Admin Center (MAC) for review, approval, and monitoring. Downloadable package — Developers can download a .zip for local testing or submission to the Microsoft Marketplace. For pro-code developers, the experience is also simplified. A C# code-first sample in the Agent Toolkit for Visual Studio is searchable, featured, and ready to use. Why It Matters This integration isn’t just about convenience; it’s about scale, control, and trust. Faster time to value — Deliver intelligent agents where people already work, without infrastructure overhead. Enterprise control — Admins retain full oversight via Microsoft 365 Admin Center, with built-in approval, review and governance flows. Developer flexibility — Both low-code creators and pro-code developers benefit from the unified publishing experience. Better Together — This capability lays the groundwork for Agent 365 publishing and deeper M365 integrations. Real-world scenarios YoungWilliams built Priya, an AI agent that helps handle government service inquiries faster and more efficiently. Using the one-click publishing flow, Priya was quickly deployed to Microsoft Teams and M365 Copilot without manual setup. This allowed Young Williams’ customers to provide faster, more accurate responses while keeping governance and compliance intact. “Integrating Microsoft Foundry with Microsoft 365 Copilot fundamentally changed how we deliver AI solutions to our government partners,” said John Tidwell, CTO of YoungWilliams. “With Foundry’s one-click publishing to Teams and Copilot, we can take an idea from prototype to production in days instead of weeks—while maintaining the enterprise-grade security and governance our clients expect. It’s a game changer for how public services can adopt AI responsibly and at scale.” Availability Publishing from Foundry to M365 is in Public Preview within the Foundry Playground. Developers can explore the preview in Microsoft Foundry and test the Teams / M365 publishing flow today. SDK and CLI extensions for code-first publishing are generally available. What’s Next in the Better Together Series This blog is part of the broader Better Together series connecting Microsoft Foundry, Microsoft 365, Agent 365, and Microsoft Copilot Studio. Continue the journey: Foundry + Agent 365 — Native Integration for Enterprise AI (Link) Start building today [Quickstart — Publish an Agent to Microsoft 365 ] Try it now in the new Foundry Playground3.1KViews0likes2CommentsAdvanced Function Calling and Multi-Agent Systems with Small Language Models in Foundry Local
Advanced Function Calling and Multi-Agent Systems with Small Language Models in Foundry Local In our previous exploration of function calling with Small Language Models, we demonstrated how to enable local SLMs to interact with external tools using a text-parsing approach with regex patterns. While that method worked, it required manual extraction of function calls from the model's output; functional but fragile. Today, I'm excited to show you something far more powerful: Foundry Local now supports native OpenAI-compatible function calling with select models. This update transforms how we build agentic AI systems locally, making it remarkably straightforward to create sophisticated multi-agent architectures that rival cloud-based solutions. What once required careful prompt engineering and brittle parsing now works seamlessly through standardized API calls. We'll build a complete multi-agent quiz application that demonstrates both the elegance of modern function calling and the power of coordinated agent systems. The full source code is available in this GitHub repository, but rather than walking through every line of code, we'll focus on how the pieces work together and what you'll see when you run it. What's New: Native Function Calling in Foundry Local As we explored in our guide to running Phi-4 locally with Foundry Local, we ran powerful language models on our local machine. The latest version now support native function calling for models specifically trained with this capability. The key difference is architectural. In our weather assistant example, we manually parsed JSON strings from the model's text output using regex patterns and frankly speaking, meticulously testing and tweaking the system prompt for the umpteenth time 🙄. Now, when you provide tool definitions to supported models, they return structured tool_calls objects that you can directly execute. Currently, this native function calling capability is available for the Qwen 2.5 family of models in Foundry Local. For this tutorial, we're using the 7B variant, which strikes a great balance between capability and resource requirements. Quick Setup Getting started requires just a few steps. First, ensure you have Foundry Local installed. On Windows, use winget install Microsoft.FoundryLocal , and on macOS, use bash brew install microsoft/foundrylocal/foundrylocal You'll need version 0.8.117 or later. Install the Python dependencies in the requirements file, then start your model. The first run will download approximately 4GB: foundry model run qwen2.5-7b-instruct-cuda-gpu If you don't have a compatible GPU, use the CPU version instead, or you can specify any other Qwen 2.5 variant that suits your hardware. I have set a DEFAULT_MODEL_ALIAS variable you can modify to use different models in utils/foundry_client.py file. Keep this terminal window open. The model needs to stay running while you develop and test your application. Understanding the Architecture Before we dive into running the application, let's understand what we're building. Our quiz system follows a multi-agent architecture where specialized agents handle distinct responsibilities, coordinated by a central orchestrator. The flow works like this: when you ask the system to generate a quiz about photosynthesis, the orchestrator agent receives your message, understands your intent, and decides which tool to invoke. It doesn't try to generate the quiz itself, instead, it calls a tool that creates a specialist QuizGeneratorAgent focused solely on producing well-structured quiz questions. Then there's another agent, reviewAgent, that reviews the quiz with you. The project structure reflects this architecture: quiz_app/ ├── agents/ # Base agent + specialist agents ├── tools/ # Tool functions the orchestrator can call ├── utils/ # Foundry client connection ├── data/ ├── quizzes/ # Generated quiz JSON files │── responses/ # User response JSON files └── main.py # Application entry point The orchestrator coordinates three main tools: generate_new_quiz, launch_quiz_interface, and review_quiz_interface. Each tool either creates a specialist agent or launches an interactive interface (Gradio), handling the complexity so the orchestrator can focus on routing and coordination. How Native Function Calling Works When you initialize the orchestrator agent in main.py, you provide two things: tool schemas that describe your functions to the model, and a mapping of function names to actual Python functions. The schemas follow the OpenAI function calling specification, describing each tool's purpose, parameters, and when it should be used. Here's what happens when you send a message to the orchestrator: The agent calls the model with your message and the tool schemas. If the model determines a tool is needed, it returns a structured tool_calls attribute containing the function name and arguments as a proper object—not as text to be parsed. Your code executes the tool, creates a message with "role": "tool" containing the result, and sends everything back to the model. The model can then either call another tool or provide its final response. The critical insight is that the model itself controls this flow through a while loop in the base agent. Each iteration represents the model examining the current state, deciding whether it needs more information, and either proceeding with another tool call or providing its final answer. You're not manually orchestrating when tools get called; the model makes those decisions based on the conversation context. Seeing It In Action Let's walk through a complete session to see how these pieces work together. When you run python main.py, you'll see the application connect to Foundry Local and display a welcome banner: Now type a request like "Generate a 5 question quiz about photosynthesis." Watch what happens in your console: The orchestrator recognized your intent, selected the generate_new_quiz tool, and extracted the topic and number of questions from your natural language request. Behind the scenes, this tool instantiated a QuizGeneratorAgent with a focused system prompt designed specifically for creating quiz JSON. The agent used a low temperature setting to ensure consistent formatting and generated questions that were saved to the data/quizzes folder. This demonstrates the first layer of the multi-agent architecture: the orchestrator doesn't generate quizzes itself. It recognizes that this task requires specialized knowledge about quiz structure and delegates to an agent built specifically for that purpose. Now request to take the quiz by typing "Take the quiz." The orchestrator calls a different tool and Gradio server is launched. Click the link to open in a browser window displaying your quiz questions. This tool demonstrates how function calling can trigger complex interactions—it reads the quiz JSON, dynamically builds a user interface with radio buttons for each question, and handles the submission flow. After you answer the questions and click submit, the interface saves your responses to the data/responses folder and closes the Gradio server. The orchestrator reports completion: The system now has two JSON files: one containing the quiz questions with correct answers, and another containing your responses. This separation of concerns is important—the quiz generation phase doesn't need to know about response collection, and the response collection doesn't need to know how quizzes are created. Each component has a single, well-defined responsibility. Now request a review. The orchestrator calls the third tool: A new chat interface opens, and here's where the multi-agent architecture really shines. The ReviewAgent is instantiated with full context about both the quiz questions and your answers. Its system prompt includes a formatted view of each question, the correct answer, your answer, and whether you got it right. This means when the interface opens, you immediately see personalized feedback: The Multi-Agent Pattern Multi-agent architectures solve complex problems by coordinating specialized agents rather than building monolithic systems. This pattern is particularly powerful for local SLMs. A coordinator agent routes tasks to specialists, each optimized for narrow domains with focused system prompts and specific temperature settings. You can use a 1.7B model for structured data generation, a 7B model for conversations, and a 4B model for reasoning, all orchestrated by a lightweight coordinator. This is more efficient than requiring one massive model for everything. Foundry Local's native function calling makes this straightforward. The coordinator reliably invokes tools that instantiate specialists, with structured responses flowing back through proper tool messages. The model manages the coordination loop—deciding when it needs another specialist, when it has enough information, and when to provide a final answer. In our quiz application, the orchestrator routes user requests but never tries to be an expert in quiz generation, interface design, or tutoring. The QuizGeneratorAgent focuses solely on creating well-structured quiz JSON using constrained prompts and low temperature. The ReviewAgent handles open-ended educational dialogue with embedded quiz context and higher temperature for natural conversation. The tools abstract away file management, interface launching, and agent instantiation, the orchestrator just knows "this tool launches quizzes" without needing implementation details. This pattern scales effortlessly. If you wanted to add a new capability like study guides or flashcards, you could just easily create a new tool or specialists. The orchestrator gains these capabilities automatically by having the tool schemas you have defined without modifying core logic. This same pattern powers production systems with dozens of specialists handling retrieval, reasoning, execution, and monitoring, each excelling in its domain while the coordinator ensures seamless collaboration. Why This Matters The transition from text-parsing to native function calling enables a fundamentally different approach to building AI applications. With text parsing, you're constantly fighting against the unpredictability of natural language output. A model might decide to explain why it's calling a function before outputting the JSON, or it might format the JSON slightly differently than your regex expects, or it might wrap it in markdown code fences. Native function calling eliminates this entire class of problems. The model is trained to output tool calls as structured data, separate from its conversational responses. The multi-agent aspect builds on this foundation. Because function calling is reliable, you can confidently delegate to specialist agents knowing they'll integrate smoothly with the orchestrator. You can chain tool calls—the orchestrator might generate a quiz, then immediately launch the interface to take it, based on a single user request like "Create and give me a quiz about machine learning." The model handles this orchestration intelligently because the tool results flow back as structured data it can reason about. Running everything locally through Foundry Local adds another dimension of value and I am genuinely excited about this (hopefully, the phi models get this functionality soon). You can experiment freely, iterate quickly, and deploy solutions that run entirely on your infrastructure. For educational applications like our quiz system, this means students can interact with the AI tutor as much as they need without cost concerns. Getting Started With Your Own Multi-Agent System The complete code for this quiz application is available in the GitHub repository, and I encourage you to clone it and experiment. Try modifying the tool schemas to see how the orchestrator's behavior changes. Add a new specialist agent for a different task. Adjust the system prompts to see how agent personalities and capabilities shift. Think about the problems you're trying to solve. Could they benefit from having different specialists handling different aspects? A customer service system might have agents for order lookup, refund processing, and product recommendations. A research assistant might have agents for web search, document summarization, and citation formatting. A coding assistant might have agents for code generation, testing, and documentation. Start small, perhaps with two or three specialist agents for a specific domain. Watch how the orchestrator learns to route between them based on the tool descriptions you provide. You'll quickly see opportunities to add more specialists, refine the existing ones, and build increasingly sophisticated systems that leverage the unique strengths of each agent while presenting a unified, intelligent interface to your users. In the next entry, we will be deploying our quizz app which will mark the end of our journey in Foundry and SLMs these past few weeks. I hope you are as excited as I am! Thanks for reading.374Views0likes0CommentsIntroducing Dragon HD Omni: Azure Speech New Voice Type Now in Preview via Microsoft Foundry
Dragon HD Omni is Microsoft Azure Speech’s newest text‑to‑speech generation, delivering over 700 high‑quality voices with enhanced expressiveness, multi‑lingual fluency, and multi‑style control — all through a unified model built in Microsoft Foundry. It removes common developer pain points such as unnatural voice prosody, limited language coverage, and heavy SSML tuning effort. The result is a powerful value proposition: faster integration, richer user experiences, and production‑ready voice output with minimal effort. Azure speech offers a broad range of unique voices for applications like virtual agents, audiobooks, podcasts, and speech-to-speech tasks. Demo video 700+ prebuilt voices Dragon HD Omni offers a range of prebuilt voices with distinct personas and emotions, supporting diverse use cases from agent-based applications to content creation. These voices unlock endless possibilities, empowering users to enhance end-to-end applications. Full update for previous generation voices Dragon HD Omni merges a wide range of prebuilt voices into one, improving contextual adaptation, prosody, expression, and keeping each voice's unique character. This technology delivers more accurate, flexible, and lifelike speech for a variety of uses. Dragon HD Omni raises the standard for natural AI voices across customer service, accessibility, and creative projects, advancing human-computer interaction. You can explore some voices from voice list, such as: "en-US-Ava:DragonHDOmniLatestNeural" "en-US-Andrew:DragonHDOmniLatestNeural" "en-US-Dana:DragonHDOmniLatestNeural" "en-US-Caleb:DragonHDOmniLatestNeural" "zh-CN-Xiaoyue:DragonHDOmniLatestNeural" "zh-CN-Yunqi:DragonHDOmniLatestNeural" "en-US-Phoebe:DragonHDOmniLatestNeural" "en-US-Lewis:DragonHDOmniLatestNeural" They will be available to try directly via Speech Playground - Microsoft Foundry Or, you can use this voice name format by adding the suffix `:DragonHDOmniLatestNeural` to try the Omni version of the given voice via direct SSML call. For example: Previous neural voice Omni version voice name de-DE-ConradNeural de-DE-Conrad:DragonHDOmniLatestNeural AI-Generated Voices Dragon HD Omni now features nearly 300 brand‑new AI‑generated voices, carefully designed to deliver an unprecedented range of vocal diversity. These voices aren’t just more of the same — they’re built to give you choice, flexibility, and creative control. With variations across: Gender – male, female, and non‑binary options Age – youthful, mature, and senior tones Pitch & tone – from warm and friendly to authoritative and professional This expanded library means you can: Personalize experiences for different audiences, whether you’re building an educational app, a customer support bot, or a storytelling platform. Strengthen brand identity by selecting voices that reflect your company’s personality and values. Increase inclusivity with diverse vocal styles that resonate across cultures and communities. Unlock creativity by experimenting with unique voice personalities for podcasts, games, or immersive experiences. Speaker name – Description Sample en-us-graphiterhodium - A bold and dramatic male voice en-us-olivepoivre - An adult female voice that is calm and soothing. Check the full Dragon HD Omni voice list at here. Styles control Standard Azure voices have limited styles due to extensive tuning requirements. The Dragon HD Omni introduces automatic style prediction using natural language descriptions, enabling advanced customization, broader style support, reduced cost, and improved expressiveness. In the initial release, styles will launch for en-US-Ava and en-US-Andrew. Supported styles angry, chill surfer, confused, curious, determined, disgusted, embarrassed, emo teenager, empathetic, encouraging, excited, fearful, friendly, grateful, joyful, mad scientist, meditative, narration, neutral, new yorker, news, reflective, regretful, relieved, sad, santa, shy, soft voice, surprised Note that style result will be strongly influenced by the input content. SSML example <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-us-ava:DragonHDOmniLatestNeural"> <mstts:express-as style="cheerful"> Wow! What an amazing day! I feel so full of energy, and everything around me seems brighter. My voice is bubbling with excitement, and I can’t stop smiling. I’m ready to take on anything that comes my way—let’s celebrate this wonderful moment together! </mstts:express-as> </voice> </speak> Multilingual and Accents All Dragon HD Omni voices support multiple languages, with the capability that can automatically predicting and generating output based on the input text. Additionally, you may utilize the tag to adjust speaking languages and accents, such as fr-FR for French, de-DE for German, etc. For a comprehensive list of supported languages and their associated syntax and attributes, please refer to the lang element. SSML example <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"><voice name="en-us-ava:Dragon HD OmniLatestNeural"><lang xml:lang="fr-FR"> Bonjour ! Ce matin, j’ai pris un café au jardin du Luxembourg. Il faisait frais, mais très agréable. Ensuite, j’ai acheté une baguette et quelques macarons. Paris est vraiment charmant.</lang> </voice> </speak> Word Boundary Event Support Dragon HD Omni supports the word boundary event, which allows developers to track the precise timing of each word as it is spoken. This feature is essential for applications requiring word-level synchronization, such as karaoke, real-time captioning, or interactive voice experiences. When the event fires, it provides: Text: The word spoken AudioOffset: The time offset in the audio stream (milliseconds) TextOffset: The position of the word in the input text Example: Python Sample Using Wordboundary Event in Azure Speech SDK import azure.cognitiveservices.speech as speechsdk def word_boundary_cb(evt): print(f"Word: '{evt.text}', AudioOffset: {evt.audio_offset / 10000}ms, TextOffset: {evt.text_offset}") speech_config = speechsdk.SpeechConfig(subscription="YourSubscriptionKey", region="YourServiceRegion") synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config) synthesizer.synthesis_word_boundary.connect(word_boundary_cb) ssml = """ <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-us-ava:DragonHDOmniLatestNeural"> Hello Azure, welcome to Dragon HD Omni! </voice> </speak> """ result = synthesizer.speak_ssml_async(ssml).get() Sample Output: Word: 'Hello', AudioOffset: 110.0ms, TextOffset: 182 Word: 'Azure', AudioOffset: 590.0ms, TextOffset: 188 Word: ',', AudioOffset: 1110.0ms, TextOffset: 193 Word: 'welcome', AudioOffset: 1270.0ms, TextOffset: 195 Word: 'to', AudioOffset: 1750.0ms, TextOffset: 203 Word: 'Dragon HD Omni', AudioOffset: 1910.0ms, TextOffset: 206 Word: '!', AudioOffset: 2750.0ms, TextOffset: 216 Parameters Dragon HD Omni supports advanced parameter tuning to help you customize voice output for different scenarios. This guide explains each parameter in simple terms and provides recommendations for adjusting them based on your goals. Overview Parameter Default Range Purpose temperature 0.7 0.3 – 1.0 Controls creativity vs. stability top_p 0.7 0.3 – 1.0 Filters output for diversity top_k 22 1 – 50 Limits number of options considered cfg_scale 1.4 1.0 – 2.0 Adjusts relevance and speech speed Tuning for Expressiveness vs. Stability Higher values for temperature, top_p, and top_k result in more expressive, emotionally varied speech. Lower values produce more stable and predictable output. Recommendation: To increase expressiveness, raise all three parameters together. Keep top_p equal to temperature for best results. Tuning for Speed and Contextual Relevance cfg_scale affects how quickly the voice speaks and how well it aligns with the context. Higher values (e.g., 1.8–2.0): faster speech, stronger contextual relevance. Lower values (e.g., 1.0–1.2): slower speech, less contextual alignment. Suggested Tuning Strategies Goal Suggested Adjustment More expressive Increase temperature, top_p, and top_k together More stable Lower temperature first, then adjust top_p if needed Faster & relevant Increase cfg_scale Slower & neutral Decrease cfg_scale The following table describes the usage of the parameters above: Single parameter: <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-us-ava:Dragon HD OmniLatestNeural" parameters="top_p=0.8"> Hello Azure! </voice> </speak> Multiple parameters: <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-us-ava:Dragon HD OmniLatestNeural" parameters="top_p=0.8;top_k=22;temperature=0.7;cfg_scale=1.2"> Hello Azure! Hello Azure! </voice> </speak> Get Started In our ongoing journey to enhance multilingual capabilities in text to speech (TTS) technology, we strive to deliver the best voices to empower your applications. Our voices are designed to be incredibly adaptive, seamlessly switching between languages based on the text input. They deliver natural-sounding speech with precise pronunciation and prosody, making them invaluable for applications like language learning, travel guidance, and international business communication. Microsoft offers an extensive portfolio of over 600 neural voices, covering more than 150 languages and locales. These TTS voices can quickly add read-aloud functionality for a more accessible app design or provide a voice to chatbots, elevating the conversational experience for users. With the Custom Neural Voice capability, businesses can also create unique and distinctive brand voices effortlessly. With these advancements, we continue to push the boundaries of what’s possible in TTS technology, ensuring that our users have access to the most versatile, high-quality voices for their needs. For more information Try our demo to listen to existing neural voices Add Text to speech to your apps today Apply for access to Custom Neural Voice Join Discord to collaborate and share feedback Contact us ttsvoicefeedback@microsoft.com2.2KViews0likes0Comments