natrual language processing
9 TopicsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry600Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Open‑source AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a production‑ready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Monday’s series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This week’s Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Meta’s unified foundation model for image and video segmentation, and MiniMax’s latest open-source agentic model optimized for complex workflows. Models of the week Z.AI’s GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why it’s interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑context agent workflows (local or low‑cost autonomous agents) Call out long‑horizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLM‑4.7‑Flash works best when you give it a clear goal and let it reason through a bounded task, here’s an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a mid‑scale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edge‑case failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize low‑frequency, high‑impact risks that standard testing misses. Recommend minimal, low‑cost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why it’s interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Best‑fit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation What’s notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat SAM 3 as a concept detector, not an interactive click tool. Use short, concrete noun‑phrase concept prompts instead of describing the scene or asking questions. Example prompt: “yellow school bus” or “shipping containers”. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: “person wearing a red jersey”. Hard‑to‑name or visually subtle objects Use exemplar‑based prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid over‑constraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAM 3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixel‑accurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frame‑level masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why it’s interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to one”. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Best‑practice prompt pattern End‑to‑end agentic coding (multi‑file edits, run‑fix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑horizon tool‑using agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1’s interleaved thinking and improved instruction‑constraint handling are designed for complex, multi‑step analytical tasks that require evidence tracking and coherent synthesis, not conversational back‑and‑forth. Long‑context reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMax‑M2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMax‑M2.1 is designed to act as a long‑horizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of information—here’s a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry771Views0likes0CommentsSecuring Azure AI Applications: A Deep Dive into Emerging Threats | Part 1
Why AI Security Can’t Be Ignored? Generative AI is rapidly reshaping how enterprises operate—accelerating decision-making, enhancing customer experiences, and powering intelligent automation across critical workflows. But as organizations adopt these capabilities at scale, a new challenge emerges: AI introduces security risks that traditional controls cannot fully address. AI models interpret natural language, rely on vast datasets, and behave dynamically. This flexibility enables innovation—but also creates unpredictable attack surfaces that adversaries are actively exploiting. As AI becomes embedded in business-critical operations, securing these systems is no longer optional—it is essential. The New Reality of AI Security The threat landscape surrounding AI is evolving faster than any previous technology wave. Attackers are no longer focused solely on exploiting infrastructure or APIs; they are targeting the intelligence itself—the model, its prompts, and its underlying data. These AI-specific attack vectors can: Expose sensitive or regulated data Trigger unintended or harmful actions Skew decisions made by AI-driven processes Undermine trust in automated systems As AI becomes deeply integrated into customer journeys, operations, and analytics, the impact of these attacks grows exponentially. Why These Threats Matter? Threats such as prompt manipulation and model tampering go beyond technical issues—they strike at the foundational principles of trustworthy AI. They affect: Confidentiality: Preventing accidental or malicious exposure of sensitive data through manipulated prompts. Integrity: Ensuring outputs remain accurate, unbiased, and free from tampering. Reliability: Maintaining consistent model behavior even when adversaries attempt to deceive or mislead the system. When these pillars are compromised, the consequences extend across the business: Incorrect or harmful AI recommendations Regulatory and compliance violations Damage to customer trust Operational and financial risk In regulated sectors, these threats can also impact audit readiness, risk posture, and long-term credibility. Understanding why these risks matter builds the foundation. In the upcoming blogs, we’ll explore how these threats work and practical steps to mitigate them using Azure AI’s security ecosystem. Why AI Security Remains an Evolving Discipline? Traditional security frameworks—built around identity, network boundaries, and application hardening—do not fully address how AI systems operate. Generative models introduce unique and constantly shifting challenges: Dynamic Model Behavior: Models adapt to context and data, creating a fluid and unpredictable attack surface. Natural Language Interfaces: Prompts are unstructured and expressive, making sanitization inherently difficult. Data-Driven Risks: Training and fine-tuning pipelines can be manipulated, poisoned, or misused. Rapidly Emerging Threats: Attack techniques evolve faster than most defensive mechanisms, requiring continuous learning and adaptation. Microsoft and other industry leaders are responding with robust tools—Azure AI Content Safety, Prompt Shields, Responsible AI Frameworks, encryption, isolation patterns—but technology alone cannot eliminate risk. True resilience requires a combination of tooling, governance, awareness, and proactive operational practices. Let's Build a Culture of Vigilance: AI security is not just a technical requirement—it is a strategic business necessity. Effective protection requires collaboration across: Developers Data and AI engineers Cybersecurity teams Cloud platform teams Leadership and governance functions Security for AI is a shared responsibility. Organizations must cultivate awareness, adopt secure design patterns, and continuously monitor for evolving attack techniques. Building this culture of vigilance is critical for long-term success. Key Takeaways: AI brings transformative value, but it also introduces risks that evolve as quickly as the technology itself. Strengthening your AI security posture requires more than robust tooling—it demands responsible AI practices, strong governance, and proactive monitoring. By combining Azure’s built-in security capabilities with disciplined operational practices, organizations can ensure their AI systems remain secure, compliant, and trustworthy, even as new threats emerge. What’s Next? In future blogs, we’ll explore two of the most important AI threats—Prompt Injection and Model Manipulation—and share actionable strategies to mitigate them using Azure AI’s security capabilities. Stay tuned for practical guidance, real-world scenarios, and Microsoft-backed best practices to keep your AI applications secure. Stay Tuned.!813Views3likes0CommentsThe Future of AI: The Model is Key, but the App is the Doorway
This post explores the real-world impact of GPT-5 beyond benchmark scores, focusing on how application design shapes user experience. It highlights early developer feedback, common integration challenges, and practical strategies for adapting apps to leverage the advanced capabilities of GPT-5 in Foundry Models. From prompt refinement to fine-tuning to new API controls, learn how to make the most of this powerful model.678Views3likes0CommentsThe Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.1.7KViews1like0CommentsThe Future of AI: Structured Vibe Coding - An Improved Approach to AI Software Development
In this post from The Future of AI series, the author introduces structured vibe coding, a method for managing AI agents like a software team using specs, GitHub issues, and pull requests. By applying this approach with GitHub Copilot, they automated a repetitive task—answering Microsoft Excel-based questionnaires—while demonstrating how AI can enhance developer workflows without replacing human oversight. The result is a scalable, collaborative model for AI-assisted software development.3.3KViews0likes0CommentsThe Future of AI: Vibe Code with Adaptive Custom Translation
This blog explores how vibe coding—a conversational, flow-based development approach—was used to build the AdaptCT playground in Azure AI Foundry. It walks through setting up a productive coding environment with GitHub Copilot in Visual Studio Code, configuring the Copilot agent, and building a translation playground using Adaptive Custom Translation (AdaptCT). The post includes real-world code examples, architectural insights, and advanced UI patterns. It also highlights how AdaptCT fine-tunes LLM outputs using domain-specific reference sentence pairs, enabling more accurate and context-aware translations. The blog concludes with best practices for vibe coding teams and a forward-looking view of AI-augmented development paradigms.765Views0likes0CommentsBuilding Enterprise Voice-Enabled AI Agents with Azure Voice Live API
The sample application covered in this post demonstrates two approaches in an end-to-end solution that includes product search, order management, automated shipment creation, intelligent analytics, and comprehensive business intelligence through Microsoft Fabric integration. Use Case Scenario: Retail Fashion Agent Core Business Capabilities: Product Discovery and Ordering: Natural language product search across fashion categories (Winter wear, Active wear, etc.) and order placement. REST APIs hosted in Azure Function Apps provide this functionality and a Swagger definition is configured in the Application for tool action. Automated Fulfillment: Integration with Azure Logic Apps for shipment creation in Azure SQL Database Policy Support: Vector-powered QnA for returns, payment issues, and customer policies. Azure AI Search & File Search capabilities are used for this requirement. Conversation Analytics: AI-powered analysis using GPT-4o for sentiment scoring and performance evaluation. The Application captures the entire conversation between the customer and Agent and sends them to an Agent running in Azure Logic Apps to perform call quality assessment, before storing the results in Azure CosmosDB. When during the voice call the customer indicates that the conversation can be concluded, the Agent autonomously sends the conversation history to the Azure Logic App to perform quality assessment. Advanced Analytics Pipeline: Real-time Data Mirroring: Automatic synchronization from Azure Cosmos DB to Microsoft Fabric OneLake Business Intelligence: Custom Data Agents in Fabric for trend analysis and insights Executive Dashboards: Power BI reports for comprehensive performance monitoring Technical Architecture Overview The solution presents two approaches, each optimized for different enterprise scenarios: 🎯Approach 1: Direct Model Integration with GPT-Realtime Architecture Components This approach provides direct integration with Azure Voice Live API using GPT-Realtime model for immediate speech-to-speech conversational experiences without intermediate text processing. The Application connects to the Voice Live API uses a Web socket connection. The semantics of this API are similar to the one used when connecting to the GPT-Realtime API directly. The Voice Live API provides additional configurability, like the choice of a custom Voice from Azure Speech Services, options for echo cancellation, noise reduction and plugging an Avatar integration. Core Technical Stack: GPT-Realtime Model: Direct audio-to-audio processing Azure Speech Voice: High-quality TTS synthesis (en-IN-AartiIndicNeural) WebSocket Communication: Real-time bidirectional audio streaming Voice Activity Detection: Server-side VAD for natural conversation flow Client-Side Function Calling: Full control over tool execution logic Key Session Configuration The Direct Model Integration uses the session configuration below: session_config = { "input_audio_sampling_rate": 24000, "instructions": system_instructions, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, }, "tools": tools_list, "tool_choice": "auto", "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "whisper-1"}, } Configuration Highlights: 24kHz Audio Sampling: High-quality audio processing for natural speech Server VAD: Optimized threshold (0.5) with 300ms padding for natural conversation flow Azure Deep Noise Suppression: Advanced noise reduction for clear audio Indic Voice Support: en-IN-AartiIndicNeural for localized customer experience Whisper-1 Transcription: Accurate speech recognition for conversation logging Connecting to the Azure Voice Live API The voicelive_modelclient.py demonstrates advanced WebSocket handling for real-time audio streaming: def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&model={model_name}" f"&agent-access-token={access_token}" ) async def connect(self): if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Function Calling Implementation The Direct Model Integration provides client-side function execution with complete control: tools_list = [ { "type": "function", "name": "perform_search_based_qna", "description": "call this function to respond to the user query on Contoso retail policies, procedures and general QnA", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], }, }, { "type": "function", "name": "create_delivery_order", "description": "call this function to create a delivery order based on order id and destination location", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "destination": {"type": "string"}, }, "required": ["order_id", "destination"], }, }, { "type": "function", "name": "perform_call_log_analysis", "description": "call this function to analyze call log based on input call log conversation text", "parameters": { "type": "object", "properties": { "call_log": {"type": "string"}, }, "required": ["call_log"], }, }, { "type": "function", "name": "search_products_by_category", "description": "call this function to search for products by category", "parameters": { "type": "object", "properties": { "category": {"type": "string"}, }, "required": ["category"], }, }, { "type": "function", "name": "order_products", "description": "call this function to order products by product id and quantity", "parameters": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer"}, }, "required": ["product_id", "quantity"], }, } ] 🤖 Approach 2: Azure AI Foundry Agent Integration Architecture Components This approach leverages existing Azure AI Foundry Service Agents, providing enterprise-grade voice capabilities as a clean wrapper over pre-configured agents. It does not entail any code changes to the Agent itself to voice enable it. Core Technical Stack: Azure Fast Transcript: Advanced multi-language speech-to-text processing Azure AI Foundry Agent: Pre-configured Agent with autonomous capabilities GPT-4o-mini Model: Agent-configured model for text processing Neural Voice Synthesis: Indic language optimized TTS Semantic VAD: Azure semantic voice activity detection Session Configuration The Agent Integration approach uses advanced semantic voice activity detection: session_config = { "input_audio_sampling_rate": 24000, "turn_detection": { "type": "azure_semantic_vad", "threshold": 0.3, "prefix_padding_ms": 200, "silence_duration_ms": 200, "remove_filler_words": False, "end_of_utterance_detection": { "model": "semantic_detection_v1", "threshold": 0.01, "timeout": 2, }, }, "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "azure-speech", "language": "en-IN, hi-IN"}, } Key Differentiators: Semantic VAD: Intelligent voice activity detection with utterance prediction Multi-language Support: Azure Speech with en-IN and hi-IN language support End-of-Utterance Detection: AI-powered conversation turn management Filler Word Handling: Configurable processing of conversational fillers Agent Integration Code The voicelive_client.py demonstrates seamless integration with Azure AI Foundry Agents. Notice that we need to provide the Azure AI Foundry Project Name and an ID of the Agent in it. We do not need to pass the model's name here, since the Agent is already configured with one. def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&agent-project-name={project_name}&agent-id={agent_id}" f"&agent-access-token={access_token}" ) async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Advanced Analytics Pipeline GPT-4o Powered Call Analysis The solution implements conversation analytics using Azure Logic Apps with GPT-4o: { "functions": [ { "name": "evaluate_call_log", "description": "Evaluate call log for Contoso Retail customer service call", "parameters": { "properties": { "call_reason": { "description": "Categorized call reason from 50+ predefined scenarios", "type": "string" }, "customer_satisfaction": { "description": "Overall satisfaction assessment", "type": "string" }, "customer_sentiment": { "description": "Emotional tone analysis", "type": "string" }, "call_rating": { "description": "Numerical rating (1-5 scale)", "type": "number" }, "call_rating_justification": { "description": "Detailed reasoning for rating", "type": "string" } } } } ] } Microsoft Fabric Integration The analytics pipeline extends into Microsoft Fabric for enterprise business intelligence: Fabric Integration Features: Real-time Data Mirroring: Cosmos DB to OneLake synchronization Custom Data Agents: Business-specific analytics agents in Fabric Copilot Integration: Natural language business intelligence queries Power BI Dashboards: Interactive reports and executive summaries Artefacts for reference The source code of the solution is available in the GitHub Repo here. An article on this topic is published on LinkedIn here A video recording of the demonstration of this App is available below: Part1 - walkthrough of the Agent configuration in Azure AI Foundry - here Part2 - demonstration of the Application that integrates with the Azure Voice Live API - here Part 3 - demonstration of the Microsoft Fabric Integration, Data Agents, Copilot in Fabric and Power BI for insights and analysis - here Conclusion Azure Voice Live API enables enterprises to build sophisticated voice-enabled AI assistants using two distinct architectural approaches. The Direct Model Integration provides ultra-low latency for real-time applications, while the Azure AI Foundry Agent Integration offers enterprise-grade governance and autonomous operation. Both approaches deliver the same comprehensive business capabilities: Natural voice interactions with advanced VAD and noise suppression Complete retail workflow automation from inquiry to fulfillment AI-powered conversation analytics with sentiment scoring Enterprise business intelligence through Microsoft Fabric integration The choice between approaches depends on your specific requirements: Choose Direct Model Integration for custom function calling and minimal latency Choose Azure AI Foundry Agent Integration for enterprise governance and existing investments1KViews1like0CommentsAnnouncing the Text PII August preview model release in Azure AI language
Azure AI Language is excited to announce a new preview model release for the PII (Personally Identifiable Information) redaction service, which includes support for more entities and languages, addressing customer-sourced scenarios and international use cases. What’s New | Updated Model 2025-08-01-preview Tier 1 language support for DateOfBirth entity: expanding upon the original English-only support earlier this year, we’ve added support for all Tier 1 languages: French, German, Italian, Spanish, Portuguese, Brazilian Portuguese, and Dutch New entity support: SortCode - a financial code used in the UK and Ireland to identify the specific bank and branch where an account is held. Currently we support this in only English. LicensePlateNumber - the standard alphanumeric code for vehicle identification. Note that our current scope does not support a license plate that contains only letters. Currently we support this in only English. AI quality improvements for financial entities, reducing false positives/negatives These updates respond directly to customer feedback and address gaps in entity coverage and language support. The broader language support enables global deployments and the new entity types allow for more comprehensive data extraction for our customers. This ensures an improved service quality for financial, criminal justice, and many other regulatory use cases, enabling more accurate and reliable service for our customers. Get started A more detailed tutorial and overview of the service feature can be found in our public docs. Learn more about these releases and several others enhancing our Azure AI Language offerings on our What’s new page. Explore Azure AI Language and its various capabilities Access full pricing details on the Language Pricing page Find the list of sensitive PII entities supported Try out Azure AI Foundry for a code-free experience We are looking forward to continuously improving our product offerings and features to meet customer needs and are keen to hear any comments and feedback.415Views1like0Comments