updates
85 TopicsChecklist for advanced calling experiences in web browsers
One of the core scenarios we help developers deliver is virtual visits, when a business hosting a meeting invites an external customer to a video call. Virtual visits occur across industries: a virtual medical appointment (tele-health), a meeting about a mortgage application (tele-banking), or even a virtual courtroom. Azure Communication Services supports Web, native mobile and service applications, so you can reach more customers on any device or platform. But Web applications are especially popular for virtual visit end-user experiences. When well implemented, Web applications allow users on any device to join a communication experience simply using a hyperlink, without installing an app. You can take advantage of Azure’s open-source UI SDK to build mobile-friendly Web experiences incredibly quickly. When you use the UI SDK, building an app is very easy and you can still customize appearance and theming significantly. But some developers may want to take advantage of the underlying Azure Communication Services Web Calling SDK, most often because you may want to customize the user experience beyond the scope of UI SDK controls. We’ve wrote this post as a checklist helping advanced developers deliver key user experiences with the low-level Calling SDK. Request access to a user's audio and video devices Continue the call even when the browser is in the background Handle UI changes to layout, like when screen-sharing is enabled Handle changes to a call’s state, such as being put on hold Handle changes to a user’s devices, such as when a headset is turned off All of this material is covered in our Calling How To guide. If you run into trouble or have any feedback please contact our team through Azure Q&A or Azure Support. Permissions A web application must acquire access to the user's microphone and camera in order to participate in a voice or video call. Most browsers and operating systems protect users by requiring apps to request access to these devices explicitly and launch standarized UI elements. Below are some examples of these permission flows: In some cases users my not see these pop ups if their permissions are blocked persistently. This happens most commonly when an end-user declines an access request once (perhaps accidently) and selects never for this website. Or if the user has configured the operating system to restrict device access for the browser wholesale. Below is a screenshot of the Mac OS setting controlling browser access. It’s crucial for the application to make sure users have permissions for the microphone and camera and walk them through the permissions process. Else the user will not be heard or seen! Recommended code flow const callClient = new CallClient(); const callAgent = await callClient.createCallAgent(tokenCredential,{...}); const deviceManager = await callClient.getDeviceManager(); const permissions =await deviceManager.askDevicePermission({audio:true, video:true}); if(permissions.audio && permissions.video){ // allow calling }else { // show educational page to teach users how to grant/enable permissions based on OS/Browser. } Common mistakes If Audio/Video permissions are not granted, do not allow the user to continue to the call. Instead give the user instruction on how to give permissions access with detail information based on the OS, version, browser, and permission status (e.g., deny forever). When Audio/Video permissions are granted “too late”. User shall grant permissions early enough not while the call is already in progress. Permissions request will timeout after 20s, if user doesn’t grant permissions during that time, then permissions will be marked as declined even if user accepts them on later stage. If mic permissions are not granted, and user ends up in the call, your app will observe an unexpected mute through the user facing diagnostic APIs. Apps should not use any 3rd party libraries that request device permissions, that can cause issues with ACS permissions and introduce regressions during the actual call. Make sure no mic, speaker volume indicators are used. Recovery There are two cases to explore: Denied permissions on the browser level. If user is not in the call, ask them to refresh the browser and allow permissions. If user is in the call you can hang-up and refresh in code: await callAgent.hangup(); window.location.reload(); Denied permissions on the OS level or Never for this website. In this case refresh won’t fix the permissions, web application has to inform user how to enable permissions based on the OS and browser and then ask them to refresh page using the above instructions after the permissions are enabled. Interruptions During an active call, even if the application successfully got access to device’s mic and camera, at any point another application or the operating system can over those devices. We call this an interruption. For example, user is in an ACS call but answers a traditional phone call, suddenly the microphone access from the web page is going to the phone application on the phone. There are many types of interruptions including: PSTN Call: will take over the microphone. Facetime Call on iOS: will take over the microphone and camera. Siri: will take over the microphone. Voice recorder: will take over the microphone. YouTube: Will take over the speakers and microphone. Recommended code flow User is in an ACS call unmuted with audio and video on Safari on iOS. User receives a PSTN call and accepts or rejects the call. User navigates to ACS call and now the state is muted, and video is turned off. Application UI shall show the correct state on the buttons and possibly inform the user to re-enable them. User unmutes and starts video again. User is in the call unmuted with audio and video. call.feature(Features.UserFacingDiagnostics).media.on('diagnosticChanged',(diagnosticInfo: MediaDiagnosticChangedEventArgs) =>{ if (diagnosticInfo.valueType === 'DiagnosticQuality' && diagnosticInfo.value === DiagnosticQuality.Bad) { if(diagnosticInfo.diagnostic.microphoneMuteUnexpectedly || diagnosticInfo.diagnostic.microphoneNotFunctioning) { // Inform the user that microsphone either stopped working permamantly or temporality ask them to unmute } } if (diagnosticInfo.valueType === 'DiagnosticQuality' && diagnosticInfo.value === DiagnosticQuality.Good) { if(diagnosticInfo.diagnostic.microphoneMuteUnexpectedly || diagnosticInfo.diagnostic.microphoneNotFunctioning) { // Inform the user that microphone recovered } } }); Common mistakes Appliactions should indicate that something interrupted the call flow, and the state is not the same as it was. Make sure the user is informed and call button state reflects reality. Recovery Guide the user to unmute and restart video after the interruption. You can verify that the recovery was successful by listening on the good user-facing diagnostics. Specifically on iOS, you app should inform the user to click unmute and start video to go back to full A/V ACS call. While on Android shall inform the user to start video only. Other platforms should have seamless recovery without any user action. Background On iOS and Android when there is an active call with audio and video and user moves the browser application to the background, audio will keep flowing automatically but outgoing video will stop automatically. This is happening for privacy reasons, and it’s controlled by the operating system. Recommended code flow Application shall inform the user when that happens, and camera button shall reflect the correct state. call.feature(Features.UserFacingDiagnostics).media.on('diagnosticChanged',(diagnosticInfo: MediaDiagnosticChangedEventArgs) =>{ if (diagnosticInfo.valueType === 'DiagnosticQuality' && diagnosticInfo.value === DiagnosticQuality.Bad) { if(diagnosticInfo.diagnostic.cameraStoppedUnexpectedly ) { // Inform the user that camera stopped working and inform them to start video again } } if (diagnosticInfo.valueType === 'DiagnosticQuality' && diagnosticInfo.value === DiagnosticQuality.Good) { if(diagnosticInfo.diagnostic.cameraStoppedUnexpectedly) { // Inform the user that camera recovered } } }); Common mistakes A very common mistake is that application has custom state management, and that state is being update only on user action but ignores the unpredictable events like interruptions etc. const startVideo = async () => void { if (!!call.localVideoStreams.length) { try{ await call.startVideo(localVideoStream); } catch (error) { console.log(error); } } else { try{ await call.stopVideo(call.localVideoStreams[0]); }catch(error) { console.log(error); } } }; Avoid maintaining the state on the UI level, rather ask SDK if user is muted or has video on, make sure you are subscribing correctly on all the edge cases and update the UI when needed. To check if user is muted use: call.isMuted; Layout updates Very often web applications during a call will change to different layouts, going from a grid layout with participants in the call to a screen-sharing layout, or one to one call. It is important to the end-user that you support these varied layouts dynamically. Recommended code flow Your app should monitor participants being added/removed, stream changes on those participants and use that information to render and update what is attached to the DOM. When elements on the screen that contain these views are re-created make sure those views are re-attached accordingly. // Whenever participants are added/removed or when layout will change application will have to re-create // the DOM stream rendered or re-use/re-attach them properly after the layout update is done. const render =(stream) => { stream.on('isAvailableChanged' ,async() => { // handle state changes }); if (stream.isAvailable) { try{ const renderer = new VideoStreamRenderer(stream); const view = await renderer.createView(); // attach to the DOM, renderers can be re-used, but have to be re-attached // after full layout changes or if the DOM elements will be re-created videoContainer.appendChild(view.target); } catch (error) { } } }; call.on('remoteParticipantsUpdated',e => { e.added.forEach(p => { // subscribe to participant events e.g. p.on('videoStreamsUpdated',(d) => { d.added.forEach(stream => { // render remote stream render(stream); }); e.removed.forEach(strem => { // cleanup the renderers cleanup(stream); }); }); // render all the streams from the participant p.videosStreams.forEach(stream => { // create renderers for all streams render(stream); }); }); e.removed.forEach(p=> { // unsubscribe and cleanup cleanup(stream); }); }); ... Common mistakes A very common mistake is applications to create these elements once and never listen for changes. Either participant changes, stream changes or UI changes. That can cause issues with missing participants, missing camera or screen-sharing streams or missing everything. Recovery Make sure your app is subscribing to all the necessary APIs and update accordingly the DOM elements when needed. Call actions Many call related actions like start/stop video or mute/unmute are async operations. When happening repeatedly without any waiting for the action to complete can have unpredictable results especially if the application keeps the state on the UI level. Recommended code flow Check the SDK state if muted or not. Trigger and wait for the action to complete. const toggleMute = async () => { try{ if(!call.isMuted){ await call.mute(); } else { await call.unmute(); } }catch (e) { console.log(e); } } Common mistakes A common mistake is using local state instead of SDK state to decide if the app should mute or unmute the call. This local state can be out of sync at any point because of unpredictable events e.g., interruptions and that will cause your toggle mute function to try to mute an already muted user. Recovery Protect your APIs from abuse, make sure application is waiting on each async callback and properly updating the state when needed. Devices Application shall handle device enumeration and be listening for devices changed and update them accordingly. Application shall allow users to select different devices than the defaults ones and pick different mic, camera or speaker when needed. Recommended code flow Applications shall allow users to select their preferred device, as long as listening for changes on the existing devices and select an alternative one if suddenly becomes unavailable. deviceManager.on('videoDevicesUpdated',async e => { e.added.forEach(addedCameraDevice => { // camera added }); e.removed.forEach(removedCameraDevice => { // camera removed }); // If the current camera being used is removed, pick a new random one // If user is in the call replace the source of the camera removed to the random one if any. await call.localVideoStreams[0]?.switchSource(videoDeviceInfo); }); deviceManager.on('audioDevicesUpdated' , e =>{ e.added.forEach(addedAudioDevice => { if(addedAudioDevice.deviceType === 'Speaker') { // it is a speaker } if(addedAudioDevice.deviceType === 'Microphone') { // it is a Mic } }); e.removed.forEach(removedAudioDevice => { // audio device removed if(removedAudioDevice.deviceType === 'Speaker') { // it is a speaker } if(removedAudioDevice.deviceType === 'Microphone') { // it is a Microphone } }); }); deviceManager.on('selectedSpeakerChanged',(e) => { // update selected speaker device }); deviceManager.on('selectedMicrophoneChanged',(e) => { // update selected mic device }); Common mistakes A common mistake is when apps the default device without making it clear to the end-user they can select the device. Devices can break or stop working and not allowing the user to select another device or force update alternatives when something is not available anymore can cause issues during an active call. Recovery Application shall listen for device updates and switch sources accordingly when needed. Call state Calls have plenty of states that apps need to handle correctly. // available call states 'None' | 'Connecting' | 'Ringing' | 'Connected' | 'LocalHold' | 'InLobby' | 'Disconnecting' | 'Disconnected' | 'EarlyMedia' Recommended code flow Based on the call states above application shall handle all of them and update the UI accordingly. Many actions shall be protected, like mute/unmute shall not be allowed if user’s call is in ringing state and shall be allowed only if it makes sense. Another thing applications shall take care of is updating the other elements like ringing sound shall stop if call state is not ringing anymore. const callStateChanged =() => { switch ( call.state) { case 'Connecting': // render connecting screen break; case 'Connected': // render connected screen break; default: // no call break; } } call.on('stateChanged', callStateChanged); callStateChanged(); Common mistakes Applications don’t cover all states and very often presume that the states are sequential. Even though usually calls go from None -> Connecting -> Connected -> Disconnecting -> Disconnected Sometimes the transitions between the states are so fast that some of them mistakenly would look like they never happened, and applications won’t handle them at all. Another common pitfall is applications to handle only the happy path and ignoring the rest, for example User A calls User B and User B accepts the call, what happens if User B rejects? How they UI shall inform the user on that? Recovery Apps should be able to handle multiple calls and call states. Nothing is blocking the user to have an active call and receive another; application shall be able to handle all these edge case listening for changes on existing and new calls. callAgent.on('callsUpdated', e => { e.added.forEach(call => { // listen for state changed on these calls and update the UI accordingly }); e.removed.forEach(call => { // cleanup removed calls }); }); callAgent.on('incomingCall', args => { args.incomingCall.on('callEnded'.args => { // display call ended message }); });Simpler, Faster Azure Communication Services Email Now Generally Available
Today, Azure Communication Services announces the general availability of Email service, expanding its comprehensive suite of multichannel communication APIs. Businesses can embed email in their customer engagement applications to enhance their customer’s communication experience. This new Email service is powered by Exchange Online and meets the security and privacy requirements of enterprises. In addition, Azure Communication Services Email provides low code /no code capabilities through Power Platform and Azure Logic Apps. With in-depth analytics and engagement tracking capabilities, Azure Communication Services Email helps businesses improve their customer engagement and communication strategies.19KViews4likes5CommentsIgnite 2024: Bidirectional real-time audio streaming with Azure Communication Services
Today at Microsoft Ignite, we are excited to announce the upcoming preview of bidirectional audio streaming for Azure Communication Services Call Automation SDK, which unlocks new possibilities for developers and businesses. This capability results in seamless, low-latency, real-time communication when integrated with services like Azure Open AI and the real-time voice APIs, significantly enhancing how businesses can build and deploy conversational AI solutions. With the advent of new AI technologies, companies are developing solutions to reduce customer wait times and improve the overall customer experience. To achieve this, many businesses are turning to AI-powered agents. These AI-based agents must be capable of having conversations with customers in a human-like manner while maintaining very low latencies to ensure smooth interactions. This is especially critical in the voice channel, where any delay can significantly impact the fluidity and natural feel of the conversation. With bidirectional streaming, businesses can now elevate their voice solutions to low-latency, human-like, interactive conversational AI agents. Our bidirectional streaming APIs enable developers to stream audio from an ongoing call on Azure Communication Services to their web server in real-time. On the server, powerful language models interpret the caller's query and stream the responses back to the caller. All this is accomplished while maintaining low latency, ensuring the caller feels like they are speaking to a human. One such example of this would be to take the audio streams and processing them through Azure Open AI’s real-time voice API and then streaming the responses back into the call. With the integration of bidirectional streaming into Azure Communication Services Call Automation SDK, developers have new tools to innovate: Leverage conversational AI Solutions: Develop sophisticated customer support virtual agents that can interact with customers in real-time, providing immediate responses and solutions. Personalized customer experiences: By harnessing real-time data, businesses can offer more personalized and dynamic customer interactions in real-time, leading to increased satisfaction and loyalty. Reduce wait times for customers: By using bidirectional audio streams in combination with Large Language Models (LLMs) you can build virtual agents that can be the first point of contact for customers reducing the need for customers waiting for a human agent being available. Integrating with real-time voice-based Large Language Models (LLMs) With the advancements in voice based LLMs, developers want to take advantage of services like bidirectional streaming and send audio directly between the caller and the LLM. Today we’ll show you how you can start audio streaming through Azure Communication Services. Developers can start bidirectional streaming at the time of answering the call by providing the WebSocket URL. //Answer call with bidirectional streaming websocketUri = appBaseUrl.Replace("https", "wss") + "/ws"; var options = new AnswerCallOptions(incomingCallContext, callbackUri) { MediaStreamingOptions = new MediaStreamingOptions( transportUri: new Uri(websocketUri), contentType: MediaStreamingContent.Audio, audioChannelType: MediaStreamingAudioChannel.Mixed, startMediaStreaming: true) { EnableBidirectional = true, AudioFormat = AudioFormat.Pcm24KMono } }; At the same time, you should open your connection with Azure Open AI real-time voice API. Once the WebSocket connection is setup, Azure Communication Services starts streaming audio to your webserver. From there you can relay the audio to Azure Open AI voice and vice versa. Once the LLM reasons over the content provided in the audio it streams audio to your service which you can stream back into the Azure Communication Services call. (More information about how to set this up will be made available after Ignite) //Receiving streaming data from Azure Communication Services over websocket private async Task StartReceivingFromAcsMediaWebSocket() { if (m_webSocket == null) return; try { while (m_webSocket.State == WebSocketState.Open || m_webSocket.State == WebSocketState.Closed) { byte[] receiveBuffer = new byte[2048]; WebSocketReceiveResult receiveResult = await m_webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), m_cts.Token); if (receiveResult.MessageType == WebSocketMessageType.Close) continue; var data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0'); if(StreamingData.Parse(data) is AudioData audioData) { using var ms = new MemoryStream(audioData.Data); await m_aiServiceHandler.SendAudioToExternalAI(ms); } } } catch (Exception ex) { Console.WriteLine($"Exception -> {ex}"); } } Streaming audio data back into Azure Communication Services //create and serialize streaming data private void ConvertToAcsAudioPacketAndForward( byte[] audioData ) { var audio = new OutStreamingData(MediaKind.AudioData) { AudioData = new AudioData(audioData) }; // Serialize the JSON object to a string string jsonString = System.Text.Json.JsonSerializer.Serialize<OutStreamingData>(audio); // Queue the async operation for later execution try { m_channel.Writer.TryWrite(async () => await m_mediaStreaming.SendMessageAsync(data)); } catch (Exception ex) { Console.WriteLine($"\"Exception received on ReceiveAudioForOutBound {ex}"); } } //Send encoded data over the websocket to Azure Communication Services public async Task SendMessageAsync(string message) { if (m_webSocket?.State == WebSocketState.Open) { byte[] jsonBytes = Encoding.UTF8.GetBytes(message); // Send the PCM audio chunk over WebSocket await m_webSocket.SendAsync(new ArraySegment<byte>(jsonBytes), WebSocketMessageType.Text, endOfMessage: true, CancellationToken.None); } } To reduce developer overhead when integrating with voice-based LLMs, Azure Communication Services supports a new sample rate of 24Khz, eliminating the need for developers to resample audio data and helping preserve audio quality in the process Next steps The SDK and documentation will be available in the next few weeks after this announcement, offering tools and information to integrate bidirectional streaming and utilize voice-based LLMs in your applications. Stay tuned and check our blog for updates!Anywhere365 integrates Azure Communication Services into their Dialogue Cloud Platform
Anywhere365 incorporates Azure Communication Services into its cloud contact center offering alongside integration with Microsoft Teams. The outcome of this helps businesses achieve more with AI-powered intelligent B2C communication solutions.2.5KViews3likes0Comments