Speech Recognition
7 TopicsWeb Speech API stopped working on Edge starting with v.134
Starting with version 134 of the Edge Browser (Windows 10, Edge Stable Version 134.0.3124.51, released March 6, 2025), the functionality of speech recognition (WebSpeechAPI) has been broken. This issue potentially impacts millions of users. Requests to Microsoft servers return a "Network error." For more details, please refer to this ticket: https://github.com/speech-translator-ext/speech-translator-readme/issues/50 An easy way to test speech recognition: https://www.google.com/intl/en/chrome/demos/speech.html About Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#browser_compatibility PS: It’s also unfortunate that, as a web browser extension developer, the "App Assure" functionality is not available. (App Assure: Facing issues with your business apps or websites on the latest version of Microsoft Edge? Microsoft will help you resolve them at no additional cost.) And in general it's very unclear where to report such issues.Solved15KViews3likes5CommentsCreate a Simple Speech REST API with Azure AI Speech Services
Explore the world of Speech recognition and Speech Synthesis with Azure AI Services. In this tutorial, you will learn how to create your own simple Speech REST API using Azure AI Speech Synthesis and Azure OpenAI services or OpenAI API. Experience the power of speech synthesis using Azure and explore the infinite number of possibilities today unveiled to you by Azure AI Services to create powerful products.5.8KViews2likes0CommentsAdding phonetic 'nicknames' to Users O365 account details to improve AA speech recognition results?
Hi, Speech recognition works great for everyone in our org except for one person. We have an Irish Lady with the first name Áine. It's pronounced, "...On-ya…", so the AA doesn't have a chance (keeps going with "...Did you say 'Tanya'...?". Is it possible to add a phonetic spelling for this user somewhere in their O365/Azure details that might help the AA in this situation?4.6KViews0likes2CommentsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.785Views2likes1CommentAmalgamate Skype&outlook; Apply 3 language AI technology to Skype; Biological authentication
Why not combine the Skype and Outlook into one? These 2 products have something in common! Outlook is mere the message in text; and Skype is the message in audio and instant video! Why not share a same account all edit in the Word on Edge? The more product you have, the less customers you get! Just combine the Skype and Outlook account into one to rival Google voice, telegram, WeChat, Whatup! And what is the most important thing Skype can do to overtake the above competitors? I think it majorly 2 point: First, applying speech recognition, translation, speech synthesis into Skype! When your foreign customer Skype you, the speech recognition and translation works to generate automatic simultaneous bilingual subs! And you can open up the dub mode to make the speech synthesis system work either to generate instant interpretation. Secondly, apply facial recognition, fingerprint/palmprint recognition, voiceprint recognition, saliva DNA recognition and other biological feature recognition technologies to Skype login, instant payment and commercial credit authentication. Make sure that every biological person has only one Skype account. The above biological authentication will be widely used in instant payment, such as paying for transportation by facial or fingerprints recognition when commuting by drone, or paying through voiceprints and voice passwords after ordering food in an online supermarket. Biological identity authentication can also help customers to understand the merchant's property status and business credit to ensure transaction security.569Views0likes0CommentsAzure Communication Services - Python SDK Call Media not working with CallConnectionClient
Hi team, I’m working on a FastAPI service that uses Azure Communication Services Call Automation (Python SDK) to handle outbound PSTN calls and real-time speech interaction. So far it is able to make phone calls but not able to do media handling part during conversation. Environment: Python version: 3.12-slim Package: azure-communication-callautomation (version: 1.4.0) Hosting: Azure Container Apps speech cognitive resource is connected to azure communication services https://drive.google.com/file/d/1uC2S-LNx_Ybpp1QwOCtqFS9pwA84mK7h/view?usp=drive_link What I’m trying to do: Place an outbound call to a PSTN number Play a greeting (TextSource) when the call is connected Start continuous speech recognition, forward transcript to an AI endpoint, then play the response back Code snippet: # Play greeting try: call_connection = client.get_call_connection(call_id) call_media = call_connection.call_media() call_media.play_to_all( play_source, operation_context="welcome-play" ) print("Played welcome greeting.") except Exception as e: print("Play Greeting Failed: ", str(e)) # start Recognition participants = list(call_connection.list_participants()) for p in participants: if isinstance(p.identifier, PhoneNumberIdentifier): active_participants[call_id] = p.identifier try: call_connection = client.get_call_connection(call_id) call_media = call_connection.call_media() call_media.start_recognizing_media( target_participant=p.identifier, input_type="speech", interrupt_call_media_operation=True, operation_context="speech-recognition" ) print("Started recognition immediately after call connected.") except Exception as e: print("Recognition start failed:", str(e)) break target_participant = active_participants.get(call_id) if not target_participant: print(f"No PSTN participant found for call {call_id}, skipping recognition.") Issue: When the CallConnected event fires,, I get different errors depending on which method I try: 'CallConnectionClient' object has no attribute 'call_media' 'CallConnectionClient' object has no attribute 'get_call_media_operations' 'CallConnectionClient' object has no attribute 'play_to_all' 'CallConnectionClient' object has no attribute 'get_call_media_client' 'CallConnectionClient' object has no attribute 'get_call_media' Also some import errors: ImportError: cannot import name 'PlayOptions' from 'azure.communication.callautomation' ImportError: cannot import name 'RecognizeOptions' from 'azure.communication.callautomation' ImportError: cannot import name 'CallMediaRecognizeOptions' from 'azure.communication.callautomation' ImportError: cannot import name 'CallConnection' ... Did you mean: 'CallConnectionState'? This makes me unsure which API is the correct/updated way to access play_to_all and start_recognizing_media. https://drive.google.com/file/d/1xI-sWil0OKfAfGwjIgG25eD7CEK95rKc/view?usp=drive_link Questions: What is the current supported way to access call media operations (play / speech recognition) in the Python SDK? Are there breaking changes between SDK versions that I should be aware of? Should I upgrade to a specific minimum version to ensure .call_media works? Thanks in advance!2Views0likes0Comments