Azure AI Language

32 Topics

Announcing the Text PII August preview model release in Azure AI language
Azure AI Language is excited to announce a new preview model release for the PII (Personally Identifiable Information) redaction service, which includes support for more entities and languages, addressing customer-sourced scenarios and international use cases. What’s New | Updated Model 2025-08-01-preview Tier 1 language support for DateOfBirth entity: expanding upon the original English-only support earlier this year, we’ve added support for all Tier 1 languages: French, German, Italian, Spanish, Portuguese, Brazilian Portuguese, and Dutch New entity support: SortCode - a financial code used in the UK and Ireland to identify the specific bank and branch where an account is held. Currently we support this in only English. LicensePlateNumber - the standard alphanumeric code for vehicle identification. Note that our current scope does not support a license plate that contains only letters. Currently we support this in only English. AI quality improvements for financial entities, reducing false positives/negatives These updates respond directly to customer feedback and address gaps in entity coverage and language support. The broader language support enables global deployments and the new entity types allow for more comprehensive data extraction for our customers. This ensures an improved service quality for financial, criminal justice, and many other regulatory use cases, enabling more accurate and reliable service for our customers. Get started A more detailed tutorial and overview of the service feature can be found in our public docs. Learn more about these releases and several others enhancing our Azure AI Language offerings on our What’s new page. Explore Azure AI Language and its various capabilities Access full pricing details on the Language Pricing page Find the list of sensitive PII entities supported Try out Azure AI Foundry for a code-free experience We are looking forward to continuously improving our product offerings and features to meet customer needs and are keen to hear any comments and feedback.
renaliu
Aug 15, 2025 Place Azure AI Foundry Blog
275Views
1like
0Comments
Introducing Azure AI Models: The Practical, Hands-On Course for Real Azure AI Skills
Hello everyone, Today, I’m excited to share something close to my heart. After watching so many developers, including myself—get lost in a maze of scattered docs and endless tutorials, I knew there had to be a better way to learn Azure AI. So, I decided to build a guide from scratch, with a goal to break things down step by step—making it easy for beginners to get started with Azure, My aim was to remove the guesswork and create a resource where anyone could jump in, follow along, and actually see results without feeling overwhelmed. Introducing Azure AI Models Guide. This is a brand new, solo-built, open-source repo aimed at making Azure AI accessible for everyone—whether you’re just getting started or want to build real, production-ready apps using Microsoft’s latest AI tools. The idea is simple: bring all the essentials into one place. You’ll find clear lessons, hands-on projects, and sample code in Python, JavaScript, C#, and REST—all structured so you can learn step by step, at your own pace. I wanted this to be the resource I wish I’d had when I started: straightforward, practical, and friendly to beginners and pros alike. It’s early days for the project, but I’m excited to see it grow. If you’re curious.. Check out the repo at https://github.com/DrHazemAli/Azure-AI-Models Your feedback—and maybe even your contributions—will help shape where it goes next!
Solved
hazem
Jul 22, 2025 Place Azure AI Foundry Discussions
622Views
1like
5Comments
Mastering Model Context Protocol (MCP): Building Multi Server MCP with Azure OpenAI
Create complex Multi MCP AI Agentic applications. Deep dive into Multi Server MCP implementation, connecting both local custom and ready MCP Servers in a single client session through a custom chatbot interface.
delynchoong
Jun 23, 2025 Place Azure AI Foundry Blog
7.3KViews
7likes
3Comments
Nested virtualization on Azure : a step-by-step guide
This article serves as a practical guide for developers and engineers to enable and configure nested virtualization on Azure. Nested virtualization allows running Hyper-V inside a virtual machine, providing enhanced flexibility and scalability for various development and data science applications. The guide walks through selecting the right Azure VM, setting up the environment, and installing Docker Desktop for efficient container management. It also addresses common troubleshooting tips to ensure a smooth setup. Whether you're working with complex machine learning models or developing applications, this guide will help you maximize the potential of nested virtualization on Azure.
alibekjakupov
Jun 23, 2025 Place Azure AI Foundry Blog
9.2KViews
1like
4Comments
Announcing the Extension of Some Language Understanding Intelligence Service (LUIS) Functionality
In 2022, we announced the deprecation of LUIS by September 30, 2025 with a recommendation to migrate to conversational language understanding (CLU). In response to feedback from our valued customers, we have decided to extend the availability of certain functionalities in LUIS until March 31, 2026. This extension aims to support our customers in their smooth migration to CLU, ensuring minimal disruption to their operations. Extension Details Here are some details on when and how the LUIS functionality will change: October 2022: LUIS resource creation is no longer available. October 31, 2025: The LUIS portal will no longer be available. LUIS Authoring (via REST API only) will continue to be available. March 31, 2026: LUIS Authoring, including via REST API, will no longer be available. LUIS Runtime will no longer be available. Before these retirement dates, please migrate to conversational language understanding (CLU), a capability of Azure AI Service for Language. CLU provides many of the same capabilities as LUIS, plus enhancements such as: Enhanced AI quality using state-of-the-art machine learning models The LLM-powered Quick Deploy feature to deploy a CLU model with no training Multilingual capabilities that allow you to train in one language and predict in 99+ others Built-in routing between conversational language understanding and custom question answering projects using orchestration workflow Access to a suite of features available on Azure AI Service for Language in the Azure AI Foundry Looking Ahead On March 31, 2026, LUIS will be fully deprecated, and any LUIS inferencing requests will return an error message. We encourage all our customers to complete their migration to CLU as soon as possible to avoid any disruptions. We appreciate your understanding and cooperation as we work together to ensure a smooth migration. Thank you for your continued support and trust in our services.
PeytonFraser
Jun 18, 2025 Place Azure AI Foundry Blog
402Views
0likes
1Comment
Enter new era of enterprise communication with Microsoft Translator Pro & document image translation
Microsoft Translator Pro: standalone, native mobile experience We are thrilled to unveil the gated public preview of Microsoft Translator Pro, our robust solution designed for enterprises seeking to dismantle language barriers in the workplace. Available on iOS, Microsoft Translator Pro offers a standalone, native experience, enabling speech-to-speech translated conversations among coworkers, users, or clients within your enterprise ecosystem. Watch how Microsoft Translator Pro transforms a hotel check-in experience by breaking down language barriers. In this video, a hotel receptionist speaks in English, and the app translates and plays the message aloud in Chinese for the traveler. The traveler responds in Chinese, and the app translates and plays the message aloud in English for the receptionist. Key features of the public preview Our enterprise version of the app is packed with features tailored to meet the stringent demands of enterprises: Core feature - speech-to-speech translation: Break language barriers: Real-time speech-to-speech translation allows you to have seamless communication with individuals speaking different languages. Unified experience: View or hear both transcription and translation simultaneously on a single device, ensuring smooth and efficient conversations. On-device translation: Harness the app's speech-to-speech translation capability without an internet connection in limited languages, ensuring your productivity remains unhampered. Full administrator control: Enterprise IT Administrators wield extensive control over the app's deployment and usage within your organization. They can fine-tune settings to manage conversation history, audit, and diagnostic logs, with the ability to disable history or configure automatic exportation of the history to cloud storage. Uncompromised privacy and security: Microsoft Translator Pro provides enterprises with a high level of translation quality and robust security. We know that Privacy and security are top priorities for you. Once granted access by your organization's admin, you can sign in the app with your organizational credentials. Your conversational data remains strictly yours, safeguarded within your Azure tenant. Neither Microsoft nor any external entities have access to your data. Join the Preview To embark on this journey with us, please complete the gating form . Upon meeting the criteria, we will grant your organization access to the paid version of the Microsoft Translator Pro app, which is now available in the US. Learn more and get started: Microsoft Translator Pro documentation. Document translation translates text embedded in images Our commitment to advancing cross-language communication takes a major step forward with a new enhancement in Azure AI Translator’s Document Translation (DT) feature. Previously, Document Translation supported fully digital documents and scanned PDFs. Starting January 2025, with this latest update, the service can also process mixed-content documents, translating both digital text and text embedded within images. Sample document translated from English to Spanish: (Frames in order: Source document, translated output document (image not translated), translated output document with image translation) How It Works To enable this feature, the Document Translation service now leverages Microsoft Azure AI Vision API to detect, extract, and translate text from images within documents. This capability is especially useful for scenarios where documents contain a mix of digital text and image-based text, ensuring complete translations without manual intervention. Getting Started To take advantage of this feature, customers can use the new optional parameter when setting up a translation request: Request A new parameter under "options" called "translateTextWithinImage" has been introduced. This parameter is of type Boolean, accepting "true" or "false." The default value is "false," so you’ll need to set it to "true" to activate the image text translation capability. Response: When this feature is enabled, the response will include additional details for transparency on image processing: totalImageScansSucceeded: The count of successfully translated image scans. totalImageScansFailed: The count of image scans that encountered processing issues. Usage and cost For this feature, customers will need to use the Azure AI Services resource, as this new feature leverages Azure AI Vision services along with Azure AI Translator. The OCR service incurs additional charges based on usage. Pricing details for the OCR service can be found here: Pricing details Learn more and get started (starting January 2025): Translator Documentation These new advancements reflect our dedication to pushing boundaries in Document Translation, empowering enterprises to connect and collaborate more effectively, regardless of language. Stay tuned for more innovations as we continue to expand the reach and capabilities of Microsoft Azure AI Translator.
SwethaMachanavajhala
Jun 16, 2025 Place Azure AI Foundry Blog
4.9KViews
0likes
1Comment
Configure Embedding Models on Azure AI Foundry with Open Web UI
Introduction Let’s take a closer look at an exciting development in the AI space. Embedding models are the key to transforming complex data into usable insights, driving innovations like smarter chatbots and tailored recommendations. With Azure AI Foundry, Microsoft’s powerful platform, you’ve got the tools to build and scale these models effortlessly. Add in Open Web UI, a intuitive interface for engaging with AI systems, and you’ve got a winning combo that’s hard to beat. In this article, we’ll explore how embedding models on Azure AI Foundry, paired with Open Web UI, are paving the way for accessible and impactful AI solutions for developers and businesses. Let’s dive in! To proceed with configuring the embedding model from Azure AI Foundry on Open Web UI, please firstly configure the requirements below. Requirements: Setup Azure AI Foundry Hub/Projects Deploy Open Web UI – refer to my previous article on how you can deploy Open Web UI on Azure VM. Optional: Deploy LiteLLM with Azure AI Foundry models to work on Open Web UI - refer to my previous article on how you can do this as well. Deploying Embedding Models on Azure AI Foundry Navigate to the Azure AI Foundry site and deploy an embedding model from the “Model + Endpoint” section. For the purpose of this demonstration, we will deploy the “text-embedding-3-large” model by OpenAI. You should be receiving a URL endpoint and API Key to the embedding model deployed just now. Take note of that credential because we will be using it in Open Web UI. Configuring the embedding models on Open Web UI Now head to the Open Web UI Admin Setting Page > Documents and Select Azure Open AI as the Embedding Model Engine. Copy and Paste the Base URL, API Key, the Embedding Model deployed on Azure AI Foundry and the API version (not the model version) into the fields below: Click “Save” to reflect the changes. Expected Output Now let us look into the scenario for when the embedding model configured on Open Web UI and when it is not. Without Embedding Models configured. With Azure Open AI Embedding models configured. Conclusion And there you have it! Embedding models on Azure AI Foundry, combined with the seamless interaction offered by Open Web UI, are truly revolutionizing how we approach AI solutions. This powerful duo not only simplifies the process of building and deploying intelligent systems but also makes cutting-edge technology more accessible to developers and businesses of all sizes. As we move forward, it’s clear that such integrations will continue to drive innovation, breaking down barriers and unlocking new possibilities in the AI landscape. So, whether you’re a seasoned developer or just stepping into this exciting field, now’s the time to explore what Azure AI Foundry and Open Web UI can do for you. Let’s keep pushing the boundaries of what’s possible!
suzarilshah
Jun 11, 2025 Place Educator Developer Blog
1.1KViews
0likes
0Comments
From Extraction to Insight: Evolving Azure AI Content Understanding with Reasoning and Enrichment
First introduced in public preview last year, Azure AI Content Understanding enables you to convert unstructured content—documents, audio, video, text, and images—into structured data. The service is designed to support consistent, high-quality output, directed improvements, built-in enrichment, and robust pre-processing to accelerate workflows and reduce cost. A New Chapter in Content Understanding Since our launch we’ve seen customers pushing the boundaries to go beyond simple data extraction with agentic solutions fully automating decisions. This requires more than just extracting fields. For example, a healthcare insurance provider decision to pay a claim requires cross-checking against insurance policies, applicable contracts, patient’s medical history and prescription datapoints. To do this a system needs the ability to interpret information in context, perform more complex enrichments and analysis across various data sources. Beyond field extraction, this requires a custom designed workflow leveraging reasoning. In response to this demand, Content Understanding now introduces Pro mode which enables enhanced reasoning, validation, and information aggregation capabilities. These updates allow the service to aggregate and compare results across sources, enrich extracted data with context, and deliver decisions as output. While Standard mode continues to offer reliable and scalable field extraction, Pro mode extends the service to support more complex content interpretation scenarios—enabling workflows that reflect the way people naturally reason over data. With this update, Content Understanding now solves a much larger component of your data processing workflows, offering new ways to automate, streamline, and enhance decision-making based on unstructured information. Key Benefits of Pro Mode Packed with cutting-edge reasoning capabilities, Pro mode revolutionizes document analysis. Multi-Content Input Process and aggregate information across multiple content files in a single request. Pro mode can build a unified schema from distributed data sources, enabling richer insight across documents. Multi-Step Reasoning Go beyond basic extraction with a process that supports reasoning, linking, validation, and enrichment. Knowledge Base Integration Seamlessly integrate with organizational knowledge bases and domain-specific datasets to enhance field inference. This ensures outputs can reason over the task of generating the output using the context of your business. When to Use Pro Mode Pro mode, currently limited to documents, is designed for scenarios where content understanding needs to go beyond surface-level extraction—ideal for use cases that traditionally require postprocessing, human review and decision-making based on multiple data points and contextual references. Pro mode enables intelligent processing that not only extracts data, but also validates, links, and enriches it. This is especially impactful when extracted information must be cross-referenced with external datasets or internal knowledge sources to ensure accuracy, consistency, and contextual depth. Examples include: Invoice processing that reconciles against purchase orders and contract terms Healthcare claims validation using patient records and prescription history Legal document review where clauses reference related agreements or precedents Manufacturing spec checks against internal design standards and safety guidelines By automating much of the reasoning, you can focus on higher value tasks! Pro mode helps reduce manual effort, minimize errors, and accelerate time to insight—unlocking new potential for downstream applications, including those that emulate higher-order decision-making. Simplified Pricing Model Introducing a simplified pricing structure that significantly reduces costs across all content modalities compared to previous versions, making enterprise-scale deployment more affordable and predictable. Expanded Feature Coverage We are also extending capabilities across various content types: Structured Document Outputs: Improved handling of tables spanning multiple pages, recognition of selection marks, and support for additional file types like .docx, .xlsx, .pptx, .msg, .eml, .rtf, .html, .md, and .xml. Classifier API: Automatically categorize/split and route documents to appropriate processing pipelines. Video Analysis: Extract data across an entire video or break a video into chapters automatically. Enrich metadata with face identification and descriptions that include facial images. Face API Preview: Detect, recognize, and enroll faces, enabling richer user-aware applications. Check out the details about each of these capabilities here - What's New for Content Understanding. Let's hear it from our customers Customers all over the globe are using Content Understanding for its powerful one-stop solution capabilities by leveraging advance modes of reasoning, grounding and confidence scores across diverse content types. ASC: AI-based analytics in ASC’s Recording Insights platform allows customers to move to a 100% compliance review coverage of conversations across multiple channels. ASC’s integration of Content Understanding replaces a previously complex setup—where multiple separate AI services had to be manually connected—with a single multimodal solution that delivers transcription, summarization, sentiment analysis, and data extraction in one streamlined interface. This shift not only simplifies implementation and accelerates time-to-value but also received positive customer feedback for its powerful features and the quick, hands-on support from Microsoft product teams. “With the integration of Content Understanding into the ASC Recording Insights platform, ASC was able to reduce R&D effort by 30% and achieve 5 times faster results than before. This helps ASC drive customer satisfaction and stay ahead of competition.” —Tobias Fengler, Chief Engineering Officer, ASC. To learn more about ASCs integration check out From Complexity to Simplicity: The ASC and Azure AI Partnership.” Ramp: Ramp, the all-in-one financial operations platform, is exploring how Azure AI Content Understanding can help transform receipts, bills, and multi-line invoices into structured data automatically. Ramp is leveraging the pre-built invoice template and experimenting with custom extraction capabilities across various document types. These experiments are helping Ramp evaluate how to further reduce manual entry and enhance the real-time logic that powers approvals, policy checks, and reconciliation. “Content Understanding gives us a single API to parse every receipt and statement we see—then lets our own AI reason over that data in real time. It's an efficient path from image to fully reconciled expense.” — Rahul S, Head of AI, Ramp MediaKind: MK.IO’s cloud-native video platform, available on Azure Marketplace—now integrates Azure AI Content Understanding to make it easy for developers to personalize streaming experiences. With just a few lines of code, you can turn full game footage into real-time, fan-specific highlight reels using AI-driven metadata like player actions, commentary, and key moments. “Azure AI Content Understanding gives us a new level of control and flexibility—letting us generate insights instantly, personalize streams automatically, and unlock new ways to engage and monetize. It’s video, reimagined.” —Erik Ramberg, VP, MediaKind Catch the full story from MediaKind in our breakout session at Build 2025 on May 18: My Game, My Way, where we walk you through the creation of personalized highlight reels in real-time. You’ll never look at your TV in the same way again. Getting Started For more details about the latest from Content Understanding check out Reasoning on multimodal content for efficient agentic AI app building Wednesday, May 21 at 2 PM PST Build your own Content Understanding solution in the Azure AI Foundry. Pro mode will be available in the Foundry starting June 1 st 2025 Refer to our documentation and sample code on Content Understanding Explore the video series on getting started with Content Understanding
Aditi_M
May 20, 2025 Place Azure AI Foundry Blog
1.8KViews
1like
0Comments
Announcing Azure AI Language new features to accelerate your agent development
In today’s fast-moving AI landscape, businesses are racing to embed conversational intelligence and automation into every customer touchpoint. However, building a reliable and scalable agent from scratch remains complex and time-consuming. Developers tell us they need a streamlined way to map diverse user intents, craft accurate responses, and support global audiences without wrestling with ad-hoc integrations. At the same time, rising expectations around data privacy and compliance introduce yet another layer of overhead. To meet these challenges, today, we’re excited to announce a suite of powerful new tools and templates designed to help developers build intelligent agents faster than ever with our Azure AI Language service. Working together with Azure AI Agent Service, whether you’re triaging user intents, serving up precise answers, or translating content on the fly, our latest releases have you covered. Our latest releases include three ready-to-use agent templates and MCP server, enhanced Conversational Language Understanding (CLU) and Custom Question Answering (CQA) with an all-new authoring experience in Azure AI Foundry portal, updated conversational agent accelerator project, and strengthened privacy controls in Personally Identifiable Information (PII) detection service. New Agent Templates We are releasing three agent templates in Azure AI Agent Service catalog to bootstrap developers to address complex conversational scenarios efficiently: intent routing agent, exact question-answering agent and text translation agent. Each of these templates includes sample code available on GitHub to set up agents powered by core capabilities in Azure AI Language and Translator. Figure 1: New agent templates available in agent catalog Intent routing agent Leverage the combined power of our Custom Language Understanding (CLU) and Custom Question Answering (CQA) products. This template creates an agent that automatically detects which pre-defined business intent a user query maps to or returns the exact answer verbatim via CQA. It gives you fully predictable and controllable intent routing with no custom model training required. You can further extend the capabilities of this agent based on your needs. For example, add additional Knowledge to the agent to handle non-critical and unpredictable user questions through RAG, or connect with other agents to route the user query based on identified intents. Check out the GitHub repo for more info about intent routing agent. Exact question-answering agent Focused solely on delivering verbatim answers from your curated knowledge base in CQA, this template is perfect for creating agent for FAQ bots, support portals, and any scenario where precision matters above all else. Similar to intent routing agent, you can enhance the exact question-answering agent with additional Knowledge to handle a wider range of user questions through RAG, improving traffic coverage and customer satisfaction. Check out the GitHub repo for more info about exact question-answering agent. Text translation agent Integrate text translation seamlessly into your agent’s workflow with Azure AI Translator. This template facilitates multilingual support through straightforward agent setup, enabling your agent to communicate with customers in their preferred language and manage translation requests across various languages with high accuracy. Check out the GitHub repo for more info about text translation agent. MCP Server with PII and Translator Tools In addition to the agent templates, we are also announcing our new language MCP server with built-in core Language service capabilities as tools. This first release includes PII detection and translation tooling, allowing developers to easily integrate it with any agents. Check out its source code and more details in the GitHub repo. CLU and CQA Enhancements To empower the new agent templates, we continued enhancing our CLU and CQA capabilities and experience: LLM-based intent detection in CLU Conversational Language Understanding (CLU) service powers intent detection and entity extraction that can be customized for various business. In addition to the traditional model training experience optimized for extreme high accuracy and low latency needs, CLU now also provides a new option that utilizes Azure OpenAI models to detect user intents. No additional training steps, datasets, or fine-tuning required. Simply define your intents and quick deploy. Figure 2: Two CLU deployment options available in Azure AI Foundry portal New query reference settings in CQA Conversational Question Answering (CQA) service delivers highly precise responses from your pre-defined question-answer pairs, ensuring users receive the exact information your business requires. To further improve the question understanding and configurability, we are introducing “queryPreferences” property in the CQA API, with the support of new query matching policy, semantic ranker and also the classic ranker used in QnA Maker to support the needs from our QnA Maker customers to migrate to CQA. All these new features will be available at the end of this week. New authoring experience in Azure AI Foundry portal We are introducing CLU and CQA authoring experience in Azure AI Foundry portal. Regardless of whether you are using Azure AI Foundry resource, the AI Hub resource or Azure AI Language resource, you can now create Custom Question Answering (CQA) task in your Foundry projects to manage your question-answer pairs, and enjoy all the above new capabilities too! By the end of this week, you will also be able to create Conversational Language Understanding (CLU) task in Foundry projects to manage your intents and entities. No longer need to switch back and forth between the Foundry portal and Language Studio to manage different projects. Figure 3: New CQA authoring experience in Azure AI Foundry portal Updated Accelerator Project for Conversational Agent We know getting started has never been simpler. To empower developers to make the best use of those new agent templates, an updated accelerator project for “Build your conversational agent” template will be available by the end of this week. The refreshed accelerator project will demonstrate how you can use the intent routing agent in an end-to-end solution from front-end to backend with sample data for testing. To access the project for the update at the end of this week, you can visit the Templates page in your Foundry project or check out the source code directly in the GitHub repo. Figure 4: "Build your conversational agent" template in the Foundry portal Enhanced Privacy Controls with Text PII Detection When building agents with large language models, safeguarding user data is paramount. With today’s announcement, we are introducing several new capabilities to our Text PII Detection service to meet our customers’ needs with more customizability and entity/language coverage: Support for PII redaction in scanned PDF documents. The document support in PII redaction allows you to provide a document file and get the redacted file in return. In addition to .docx, .txt and text PDF file, you can now also provide a scanned file in .pdf for redaction. For more information how to use the native document support, see Detect and redact Personally Identifying Information in native documents (preview). Support for custom synonyms of PII detection entities. Now you can use “synonyms” property in the API call to define your own synonyms for a PII entity to achieve better detection rate. Support for exclusion of specific entity values from the detection. Use “ValueExclusionPolicy” property to specify words and terms that you want to exclude from the PII detection. Extended context window span limit for rule-based entities detection. The context window span is the length of the continuous data interval (or “chunks”) within your input text that the service internally takes at once for detecting the entities. We’ve extended it for rule-based entities to 500 characters to match the span used by our model-based detectors, ensuring the consistent detection behavior across all entities. For other service limits, including the maximum characters of input text, see Data limits for Language service features. Support new entity type, Date of Birth. For all supported entities, see Supported Personally Identifiable Information (PII) entity categories. Enhanced capabilities in Text PII container: Support custom regex in Text PII container. Only available in our container offering, now you can define your own regular expressions directly within the Text PII container to catch any patterns you care about. By the end of this month, we’ll also support more new languages, Chinese, Japanese, Korean and Thai, in PII container to keep the parity on the language support between the cloud service and the container. We can’t wait to see the innovative agents you’ll build with these new capabilities. Let us know what you create, share your feedback, and stay tuned for even more enhancements coming soon! Resources: Azure AI Language Azure AI Translator Azure AI Agent Service Intent routing agent Exact question-answering agent Text translation agent MCP server with PII detection and translator Conversational Language Understanding (CLU) Custom Question Answering (CQA) “Build your conversational agent” accelerator project Personally Identifiable Information (PII) detection
xiaoying
May 19, 2025 Place Azure AI Foundry Blog
1.1KViews
0likes
0Comments
Power Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.
suzarilshah
May 16, 2025 Place Educator Developer Blog
778Views
1like
0Comments