sample

11 Topics

Azure Communication Services technical documentation table of contents update
Technical documentation is like a map for using a platform—whether you're building services, solving problems, or learning new features, great documentation shows you the way to the solution you need. But what good is a map if it’s hard to read or confusing to follow? That’s why easy-to-navigate documentation is so important. It saves time, reduces frustration, and helps users focus on what they want to achieve. Azure Communication Services is a powerful platform, and powerful platforms require great documentation for both new and experienced developers. Our customers tell us consistently that our docs are a crucial part of their experience of using our platform. Some studies suggest that documentation and samples are the most important elements of a great developer experience. In this update, we’re excited to share how we’ve improved our technical documentation’s navigation to make it quicker and simpler than ever to find the information you need when you need it. Why did we change? In order for our content to be useful to you, it first needs to be findable. When we launched Azure Communication Services, the small number of articles on our site made it easy to navigate and find relevant content. As we’ve grown, though, our content became harder to find for users due to the quantity of articles they need to navigate. To refresh your memory, the table of contents on our docs site used to be structured with these base categories: Overview Quickstart Tutorials Samples Concepts Resources References These directory names describ e the type of content they contain. This structure is a very useful model for products with a clearly-defined set of use cases, where typically a customer’s job-to-be-done is more constrained, but it breaks down when used for complex, powerful platforms that support a broad range of use cases in the way that Azure Communication Services does. We tried a number of small-scale changes to address the problems people were having on our site, such as having certain directories default to open on page load, but as the site grew, we became concerned that our site navigation model was becoming confusing to users and having a negative impact on their experience with our product. We decided to test that hypothesis and consider different structures that might serve our content and our customers better. Our user research team interviewed 18 customers with varying levels of experience on our platform. The research uncovered several problems that customers were having with the way our docs navigation was structured. From confusing folder titles, to related topics being far away from each other in the nav model, to general confusion around what folder titles meant, to problems finding some of the most basic information about using our platform, and a host of other issues, our user research made it clear to us that we had a problem that we needed to fix for our users. What did we change in this release? To help address these issues, we made a few key changes to make our table of contents simpler and easier to navigate. The changes we made were strictly to site navigation, not page content, and they include: We've restructured the root-level navigation to be focused on communication modality and feature type, rather than content type, to better model our customers' jobs-to-be-done. Topics include All supported communication channels Horizontal features that span more than one channel Topics of special interest to our customers, like AI Basic needs, like troubleshooting and support This will allow customers to more easily find the content they need by focusing on the job they need to do, rather than on the content type. We've simplified the overview and fundamentals sections to make the site less overwhelming on first load. We've surfaced features that customers told us were difficult to find, such as UI Library, Teams interop, and Job router. We've organized the content within each directory to roughly follow a beginner->expert path to make content more linear, and to make it easier for a user to find the next step in completing their task. We've removed unnecessary layers in our nav, making content easier to find. We've added a link to pricing information to each primitive to address a common customer complaint, that pricing information is difficult to find and understand. We've combined quickstarts, samples, and tutorials into one directory per primitive, called "Samples and tutorials", to address a customer complaint that our category names were confusing. We added a directory to each primitive for Resources, to keep important information close by. We added root-level directories for Common Scenarios, Troubleshooting, and Help and support. We did a full pass across all TOC entries to ensure correct casing, and edited entries for readability and consistency with page content, as well as for length to adhere to Microsoft guidelines and improve readability. These changes have led us to a structure that we feel less taxing for the reader, especially on first visit, maps more closely to the customer’s mental model of the information by focusing on the job-to-be-done rather than content type, helps lead them through the content from easiest to hardest, helps make it easier for them to find the information they need when they need it, and helps remind them of all the different features we support. Here’s what the table of contents looks like on page load as of Feb 6: These changes are live now. You can see them on the Azure Communication Services Technical documentation site. What’s next: In the coming weeks we will continue to make refinements based on customer feedback and our assessment of usage metrics. Our content team will begin updating article content to improve readability and enhance learning. We will be monitoring our changes and seeking your feedback. How will we monitor the effectiveness of our changes? To track the effectiveness of our changes and to be sure we haven’t regressed, we’ll be tracking a few key metrics Bounce rates: We’ll be on the lookout for an increase in bounce rates, which would indicate that customers are frequently landing on pages that don’t meet their expectations. Page Views: We’ll be tracking the number of page views for our most-visited pages across different features. A decrease in page views for these pages will be an indicator that customers are not able to find pages that had previously been popular. Customer Interviews: We will be reaching out to some of you to get your impressions of the new structure of our content over the coming weeks. Customer Surveys: We've created a survey that you can use to give us your feedback. We'll also be adding this link to select pages to allow you to tell us what you think of our changes while you're using them! So, give our new site navigation a try, and please don’t hesitate to share your feedback either by filling out our survey or by sending an email to acs-docs-feedback@microsoft.com. We look forward to hearing from you! A
sroons
Feb 12, 2025 Place Azure Communication Services Blog
2.6KViews
2likes
2Comments
Build your own real-time voice agent - Announcing preview of bidirectional audio streaming APIs
We are pleased to announce the public preview of bidirectional audio streaming, enhancing the capabilities of voice based conversational AI. During Satya Nadella’s keynote at Ignite, Seth Juarez demonstrated a voice agent engaging in a live phone conversation with a customer. You can now create similar experiences using Azure Communication Services bidirectional audio streaming APIs and GPT 4o model. In our recent Ignite blog post, we announced the upcoming preview of our audio streaming APIs. Now that it is publicly available, this blog describes how to use the bidirectional audio streaming APIs available in Azure Communication Services Call Automation SDK to build low-latency voice agents powered by GPT 4o Realtime API. How does the bi-directional audio streaming API enhance the quality of voice-driven agent experiences? AI-powered agents facilitate seamless, human-like interactions and can engage with users through various channels such as chat or voice. In the context of voice communication, low latency in conversational responses is crucial as delays can cause users to perceive a lack of response and disrupt the flow of conversation. Gone are the days when building a voice bot required stitching together multiple models for transcription, inference, and text-to-speech conversion. Developers can now stream live audio from an ongoing call (VoIP or telephony) to their backend server logic using the bi-directional audio streaming APIs, leverage GPT 4o to process audio input, and deliver responses back with minimal latency for the caller/user. Building Your Own Real-Time Voice Agent In this section, we walk you through a QuickStart for using Call Automation’s audio streaming APIs for building a voice agent. Before you begin, ensure you have the following: Active Azure Subscription: Create an account for free. Azure Communication Resource: Create an Azure Communication Resource and record your resource connection string for later use. Azure Communication Services Phone Number: A calling-enabled phone number. You can buy a new phone number or use a free trial number. Azure Dev Tunnels CLI: For details, see Enable dev tunnel. Azure OpenAI Resource: Set up an Azure OpenAI resource by following the instructions in Create and deploy an Azure OpenAI Service resource. Azure OpenAI Service Model: To use this sample, you must have the GPT-4o-Realtime-Preview model deployed. Follow the instructions at GPT-4o Realtime API for speech and audio (Preview) to set it up. Development Environment: Familiarity with .NET and basic asynchronous programming. Clone the quick start sample application: You can find the quick start at Azure Communication Services Call Automation and Azure OpenAI Service. git clone https://github.com/Azure-Samples/communication-services-dotnet-quickstarts.git After completing the prerequisites, open the cloned project and follow these setup steps. Environment Setup Before running this sample, you need to set up the previously mentioned resources with the following configuration updates: Setup and host your Azure dev tunnel Azure Dev tunnels is an Azure service that enables you to expose locally hosted web services to the internet. Use the following commands to connect your local development environment to the public internet. This creates a tunnel with a persistent endpoint URL and enables anonymous access. We use this endpoint to notify your application of calling events from the Azure Communication Services Call Automation service. devtunnel create --allow-anonymous devtunnel port create -p 5165 devtunnel host 2. Navigate to the quick start CallAutomation_AzOpenAI_Voice from the project you cloned. 3. Add the required API keys and endpoints Open the appsettings.json file and add values for the following settings: DevTunnelUri: Your dev tunnel endpoint AcsConnectionString: Azure Communication Services resource connection string AzureOpenAIServiceKey: OpenAI Service Key AzureOpenAIServiceEndpoint: OpenAI Service Endpoint AzureOpenAIDeploymentModelName: OpenAI Model name Run the Application Ensure your AzureDevTunnel URI is active and points to the correct port of your localhost application. Run the command dotnet run to build and run the sample application. Register an Event Grid Webhook for the IncomingCall Event that points to your DevTunnel URI (https://<your-devtunnel-uri/api/incomingCall>). For more information, see Incoming call concepts. Test the app Once the application is running: Call your Azure Communication Services number: Dial the number set up in your Azure Communication Services resource. A voice agent answer, enabling you to converse naturally. View the transcription: See a live transcription in the console window. QuickStart Walkthrough Now that the app is running and testable, let’s explore the quick start code snippet and how to use the new APIs. Within the program.cs file, the endpoint /api/incomingCall, handles inbound calls. app.MapPost("/api/incomingCall", async ( [FromBody] EventGridEvent[] eventGridEvents, ILogger<Program> logger) => { foreach (var eventGridEvent in eventGridEvents) { Console.WriteLine($"Incoming Call event received."); // Handle system events if (eventGridEvent.TryGetSystemEventData(out object eventData)) { // Handle the subscription validation event. if (eventData is SubscriptionValidationEventData subscriptionValidationEventData) { var responseData = new SubscriptionValidationResponse { ValidationResponse = subscriptionValidationEventData.ValidationCode }; return Results.Ok(responseData); } } var jsonObject = Helper.GetJsonObject(eventGridEvent.Data); var callerId = Helper.GetCallerId(jsonObject); var incomingCallContext = Helper.GetIncomingCallContext(jsonObject); var callbackUri = new Uri(new Uri(appBaseUrl), $"/api/callbacks/{Guid.NewGuid()}?callerId={callerId}"); logger.LogInformation($"Callback Url: {callbackUri}"); var websocketUri = appBaseUrl.Replace("https", "wss") + "/ws"; logger.LogInformation($"WebSocket Url: {callbackUri}"); var mediaStreamingOptions = new MediaStreamingOptions( new Uri(websocketUri), MediaStreamingContent.Audio, MediaStreamingAudioChannel.Mixed, startMediaStreaming: true ) { EnableBidirectional = true, AudioFormat = AudioFormat.Pcm24KMono }; var options = new AnswerCallOptions(incomingCallContext, callbackUri) { MediaStreamingOptions = mediaStreamingOptions, }; AnswerCallResult answerCallResult = await client.AnswerCallAsync(options); logger.LogInformation($"Answered call for connection id: {answerCallResult.CallConnection.CallConnectionId}"); } return Results.Ok(); }); In the preceding code, MediaStreamingOptions encapsulates all the configurations for bidirectional streaming. WebSocketUri: We use the dev tunnel URI with the WebSocket protocol, appending the path /ws. This path manages the WebSocket messages. MediaStreamingContent: The current version of the API supports only audio. Audio Channel: Supported formats include: Mixed: Contains the combined audio streams of all participants on the call, flattened into one stream. Unmixed: Contains a single audio stream per participant per channel, with support for up to four channels for the most dominant speakers at any given time. You also get a participantRawID to identify the speaker. StartMediaStreaming: This flag, when set to true, enables the bidirectional stream automatically once the call is established. EnableBidirectional: This enables audio sending and receiving. By default, it only receives audio data from Azure Communication Services to your application. AudioFormat: This can be either 16k pulse code modulation (PCM) mono or 24k PCM mono. Once you configure all these settings, you need to pass them to AnswerCallOptions. Now that the call is established, let's dive into the part for handling WebSocket messages. This code snippet handles the audio data received over the WebSocket. The WebSocket's path is specified as /ws, which corresponds to the WebSocketUri provided in the configuration. app.Use(async (context, next) => { if (context.Request.Path == "/ws") { if (context.WebSockets.IsWebSocketRequest) { try { var webSocket = await context.WebSockets.AcceptWebSocketAsync(); var mediaService = new AcsMediaStreamingHandler(webSocket, builder.Configuration); // Set the single WebSocket connection await mediaService.ProcessWebSocketAsync(); } catch (Exception ex) { Console.WriteLine($"Exception received {ex}"); } } else { context.Response.StatusCode = StatusCodes.Status400BadRequest; } } else { await next(context); } }); The method await mediaService.ProcessWebSocketAsync() processesg all incoming messages. The method establishes a connection with OpenAI, initiates a conversation session, and waits for a response from OpenAI. This method ensures seamless communication between the application and OpenAI, enabling real-time audio data processing and interaction. // Method to receive messages from WebSocket public async Task ProcessWebSocketAsync() { if (m_webSocket == null) { return; } // Start forwarder to AI model m_aiServiceHandler = new AzureOpenAIService(this, m_configuration); try { m_aiServiceHandler.StartConversation(); await StartReceivingFromAcsMediaWebSocket(); } catch (Exception ex) { Console.WriteLine($"Exception -> {ex}"); } finally { m_aiServiceHandler.Close(); this.Close(); } } Once the application receives data from Azure Communication Services, it parses the incoming JSON payload to extract the audio data segment. The application then forwards the segment to OpenAI for further processing. The parsing ensures data integrity ibefore sending it to OpenAI for analysis. // Receive messages from WebSocket private async Task StartReceivingFromAcsMediaWebSocket() { if (m_webSocket == null) { return; } try { while (m_webSocket.State == WebSocketState.Open || m_webSocket.State == WebSocketState.Closed) { byte[] receiveBuffer = new byte; WebSocketReceiveResult receiveResult = await m_webSocket.ReceiveAsync(new ArraySegment(receiveBuffer), m_cts.Token); if (receiveResult.MessageType != WebSocketMessageType.Close) { string data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0'); await WriteToAzOpenAIServiceInputStream(data); } } } catch (Exception ex) { Console.WriteLine($"Exception -> {ex}"); } } Here is how the application parses and forwards the data segment to OpenAI using the established session: private async Task WriteToAzOpenAIServiceInputStream(string data) { var input = StreamingData.Parse(data); if (input is AudioData audioData) { using (var ms = new MemoryStream(audioData.Data)) { await m_aiServiceHandler.SendAudioToExternalAI(ms); } } } Once the application receives the response from OpenAI, it formats the data to be forwarded to Azure Communication Services and relays the response in the call. If the application detects voice activity while OpenAI is talking, it sends a barge-in message to Azure Communication Services to manage the voice playing in the call. // Loop and wait for the AI response private async Task GetOpenAiStreamResponseAsync() { try { await m_aiSession.StartResponseAsync(); await foreach (ConversationUpdate update in m_aiSession.ReceiveUpdatesAsync(m_cts.Token)) { if (update is ConversationSessionStartedUpdate sessionStartedUpdate) { Console.WriteLine($"<<< Session started. ID: {sessionStartedUpdate.SessionId}"); Console.WriteLine(); } if (update is ConversationInputSpeechStartedUpdate speechStartedUpdate) { Console.WriteLine($" -- Voice activity detection started at {speechStartedUpdate.AudioStartTime} ms"); // Barge-in, send stop audio var jsonString = OutStreamingData.GetStopAudioForOutbound(); await m_mediaStreaming.SendMessageAsync(jsonString); } if (update is ConversationInputSpeechFinishedUpdate speechFinishedUpdate) { Console.WriteLine($" -- Voice activity detection ended at {speechFinishedUpdate.AudioEndTime} ms"); } if (update is ConversationItemStreamingStartedUpdate itemStartedUpdate) { Console.WriteLine($" -- Begin streaming of new item"); } // Audio transcript updates contain the incremental text matching the generated output audio. if (update is ConversationItemStreamingAudioTranscriptionFinishedUpdate outputTranscriptDeltaUpdate) { Console.Write(outputTranscriptDeltaUpdate.Transcript); } // Audio delta updates contain the incremental binary audio data of the generated output audio // matching the output audio format configured for the session. if (update is ConversationItemStreamingPartDeltaUpdate deltaUpdate) { if (deltaUpdate.AudioBytes != null) { var jsonString = OutStreamingData.GetAudioDataForOutbound(deltaUpdate.AudioBytes.ToArray()); await m_mediaStreaming.SendMessageAsync(jsonString); } } if (update is ConversationItemStreamingTextFinishedUpdate itemFinishedUpdate) { Console.WriteLine(); Console.WriteLine($" -- Item streaming finished, response_id={itemFinishedUpdate.ResponseId}"); } if (update is ConversationInputTranscriptionFinishedUpdate transcriptionCompletedUpdate) { Console.WriteLine(); Console.WriteLine($" -- User audio transcript: {transcriptionCompletedUpdate.Transcript}"); Console.WriteLine(); } if (update is ConversationResponseFinishedUpdate turnFinishedUpdate) { Console.WriteLine($" -- Model turn generation finished. Status: {turnFinishedUpdate.Status}"); } if (update is ConversationErrorUpdate errorUpdate) { Console.WriteLine(); Console.WriteLine($"ERROR: {errorUpdate.Message}"); break; } } } catch (OperationCanceledException e) { Console.WriteLine($"{nameof(OperationCanceledException)} thrown with message: {e.Message}"); } catch (Exception ex) { Console.WriteLine($"Exception during AI streaming -> {ex}"); } } Once the data is prepared for Azure Communication Services, the application sends the data over the WebSocket: public async Task SendMessageAsync(string message) { if (m_webSocket?.State == WebSocketState.Open) { byte[] jsonBytes = Encoding.UTF8.GetBytes(message); // Send the PCM audio chunk over WebSocket await m_webSocket.SendAsync(new ArraySegment<byte>(jsonBytes), WebSocketMessageType.Text, endOfMessage: true, CancellationToken.None); } } This wraps up our QuickStart overview. We hope you create outstanding voice agents with the new audio streaming APIs. Happy coding! For more information about Azure Communication Services bidirectional audio streaming APIs , check out: GPT-4o Realtime API for speech and audio (Preview) Audio streaming overview - audio subscription Quickstart - Server-side Audio Streaming
MilanKaur
Dec 05, 2024 Place Azure Communication Services Blog
4.9KViews
2likes
0Comments
From Call Transcripts to CRM Gold: AI-Powered Post-Call Intelligence
Call transcripts are full of business intelligence that never makes it to your Customer Relationship Management (CRM) software. Customer sentiment, promised follow-ups, escalation signals – all lost in walls of conversational text. We’ve all seen it happen. Agent frantically scribbles notes during calls. Promised callbacks fall through cracks. Critical details get buried in transcript files that sit untouched for months. Meanwhile, your CRM holds half-empty case records and incomplete customer histories. What if those transcripts could talk back? How We Transform Call Transcripts into CRM Data Instead of just explaining what's possible, we built something you can click through and try on your own. Pick from realistic customer scenarios – an airline passenger with a mystery credit card charge, a small business owner confused about cloud billing. Watch AI pull out the important stuff while you keep full control before anything touches your systems. Takes minutes, not hours. The real value isn't just saving time on data entry. It's about not losing what actually happened on that call. If you're more of a visual person and would like to see a video demonstration of this project in action, check out the walkthrough below: How Human-AI Collaboration Ensures Reliable Results We made some choices that matter if you're thinking about real business use. Raw transcripts are a mess. Full of "ums" and "let me transfer you" and customers repeating themselves. AI cuts through all that noise to find what you need: the core issue, what you promised, what happens next. Your agents get the story, not a wall of text. Human oversight isn't just editing typos. The demo shows what real business judgment looks like. Sometimes what sounds like anger is actually relief. Sometimes "I'll call you back" means different things in different contexts. AI spots patterns brilliantly, but humans understand nuance and catch the edge cases that matter for customer relationships. Structured output changes guarantees consistent CRM data. Here's the technical piece that makes everything else possible: we use JSON schemas to guarantee consistent results. Same format every time, no parsing messy text responses. Your CRM integration becomes trivial – clean data that maps directly to fields without custom logic to handle variations. Business Intelligence You Can Act on Immediately Once you've got structured call intelligence, things get interesting fast. Picture this: sentiment analysis flags an angry customer at 2 PM. Your senior agent gets pinged before the promised callback at 4 PM. Or someone mentions a competitor during what should've been routine support. Your sales team knows by end of day, not next quarter. A customer describes the same technical issue three other people called about this week? Your product team sees the pattern before it becomes a crisis trending on social media. The flow works the same whether you're dealing with frustrated airline customers, confused SaaS users, or financial service complaints. It’s the same AI extraction, same human review, just different actions triggered by what you find. The Real Opportunity Every company with customer calls has intelligence locked in transcripts and recordings. Communication APIs plus AI can unlock that intelligence and automate workflows that currently eat up agent time. This demo shows one practical path: structured AI analysis with human oversight, feeding straight into business systems. Scales from small teams manually reviewing suggestions to enterprise deployments processing thousands of calls with minimal human touch. You're not just automating data entry. You're capturing business intelligence that vanishes the moment each call ends. And with the human review layer, you're building confidence in the system while training it to handle your specific business context. How to Connect Live Call Data to Your AI Analysis Real implementations need live conversation data rather than static transcripts. That's where the integration work begins. Azure Communication Services Call Automation APIs provide real-time transcription during calls. Microsoft Teams APIs extract transcripts from recorded meetings. Most phone systems can export call transcripts via API these days. The workflow stays identical – transcript → AI analysis → human review → business actions – but connecting those data sources requires additional plumbing. Real production systems also need async processing. Instead of waiting for AI analysis during the call, you'd typically queue transcripts for background processing and surface insights when agents need them. Simple Technology Stack: Node.js + Azure OpenAI We deliberately chose straightforward technology to prove a point: you don't need exotic tools for sophisticated call intelligence. Node.js backend. Vanilla JavaScript frontend. OpenAI structured output with strict JSON schemas. That's it. What do we mean by "structured output"? Instead of returning free-form text, the LLM is instructed to respond with JSON that matches a predefined schema. This ensures the output is predictable, machine-readable, and easy to integrate directly into systems like CRMs with no extra parsing or guesswork required. Important caveat: not all LLMs support structured output yet. OpenAI, Azure OpenAI, and a few others do, but if you're using a different provider, you'd need additional validation logic to ensure consistent formatting. The structured approach is what makes CRM integration seamless, so it's worth choosing providers that support it. Try the Post-Call Intelligence Demo Clone the repository and test it out today! The demo includes setup instructions and realistic scenarios. Not production code, but shows how AI-powered call intelligence works in practice. Full instructions are in the README of the repo, but you only need to enter a few commands in your terminal. For more information and resources, check out the docs on real-time transcription or see more communication API + AI sample demos. Those transcripts sitting in your phone system? They're not just records of what happened. They're roadmaps for what should happen next. The intelligence is sitting there. The tools exist to extract it reliably. Question is whether you'll let it keep disappearing into the void.
seankeegan
Sep 24, 2025 Place Azure Communication Services Blog
214Views
1like
2Comments
Moving Email Traffic from Exchange to Azure Communication Services
This article describes the important steps and considerations for transitioning from Exchange Online or other on-premises solutions to the Azure Communication Services email platform. It is also relevant for customers migrating from other mail solutions, whether on-premises or hybrid, to Azure Communication Services.
Manish-StrategicPM
Jun 05, 2025 Place Azure Communication Services Blog
2.8KViews
1like
0Comments
Create next-gen voice agents with Azure AI's Voice Live API and Azure Communication Services
Today at Microsoft Build, we’re excited to announce the General Availability of bidirectional audio streaming for Azure Communication Services Call Automation SDK. Unveiling the power of speech-to-speech AI through Azure Communication Services! As previously seen at Microsoft Ignite in November 2024 the Call Automation bidirectional streaming APIs already work with services like Azure OpenAI to build conversational voice agents through speech to speech integrations. Now with General Availability release of Call Automation bidirectional streaming API and Azure AI Speech Services Voice Live API (Preview), creating voice agents has never been easier. Imagine AI agents that deliver seamless, low-latency, and naturally fluent conversations, transforming the way businesses and customers interact. Bidirectional Streaming APIs allow customers to stream audio from ongoing calls to their webserver in near real-time, where their voice enabled Large Language Models (LLMs) can ingest the audio to reason over and provide voice responses to stream back into the call. In this release we have added support for extra security by adding JSON Web Token (JWT) based authentication for the websocket connection allowing developers to make sure they’re creating secure solutions. As industries like customer service, education, HR, gaming, and public services see a surge in demand for generative AI voice chatbots, businesses are seeking real-time, natural-sounding voice interactions with the latest and greatest GenAI models. Integrating Azure Communication Services with the new Voice Live API from Azure AI Speech Services provides a low-latency interface that facilitates streaming speech input and output with Azure AI Speech’s advanced audio and voice capabilities. It supports multiple languages, diverse voices, and customization, and can even integrate with avatars for enhanced engagement. On the server side, powerful language models interpret the caller's queries and stream human-like responses back in real time, ensuring fluid and engaging conversations. By integrating these two technologies customers can create new innovative solutions for: Multilingual agents Develop virtual customer service representatives capable of of having conversations with end customers in their preferred language, allowing customers creating solutions for multilingual regions to create one solution to serve multiple languages and regions. Noise suppression and echo cancellation For AI voice agents to be effective they need clear audio to understand what the user is requesting, in order to improve AI efficiency, you can use out of the box noise suppression and echo cancellation built into the Voice Live API to help provide your AI agent the best quality audio to be able to clearly understand the end users requests and assist them. Support for branded voices Build voice agents that stay on brand with custom voices that represent your brand in any interaction with the customer, use Azure AI Speech services to create custom voice models that represent your brand and provide familiarity for your customers. How to Integrate Azure Communication Services with Azure AI Speech Service Voice Live API Language support With the integration to Voice Live API, you can now create solutions for over 150+ locales for speech input and output with 600+ realistic voices out of the box. I if these voices don’t suit your needs, customers can take this one step further and create custom speech models for their brand. How to start bidirectional streaming var mediaStreamingOptions = new MediaStreamingOptions( new Uri(websocketUri), MediaStreamingContent.Audio, MediaStreamingAudioChannel.Mixed, startMediaStreaming: true ) { EnableBidirectional = true, AudioFormat = AudioFormat.Pcm24KMono }; How to connect to Voice Live API (Preview) string GetSessionUpdate() { var jsonObject = new { type = "session.update", session = new { turn_detection = new { type = "azure_semantic_vad", threshold = 0.3, prefix_padding_ms = 200, silence_duration_ms = 200, remove_filler_words = false }, input_audio_noise_reduction = new { type = "azure_deep_noise_suppression" }, input_audio_echo_cancellation = new { type = "server_echo_cancellation" }, voice = new { name = "en-US-Aria:DragonHDLatestNeural", type = "azure-standard", temperature = 0.8 } } }; Next Steps The SDK and documentation along will be available in the next few weeks following this announcement, allowing you to build your own solutions using Azure Communication Services and Azure AI Voice Live API. You can download our latest sample from GitHub to try this for yourself. To learn more about the Voice Live API and all its different capabilities, see Azure AI Blog.
KunaalPunjabi
May 20, 2025 Place Azure Communication Services Blog
3.5KViews
1like
0Comments
New tools make it easier to build with Azure Communication Services
New sample library and tutorial for using GitHub Copilot to accelerate development with Azure Communication Services. These resources are designed to accelerate your solution development while improving code quality.
pereiralex
Feb 04, 2025 Place Azure Communication Services Blog
2.3KViews
1like
0Comments
AI-Powered Chat with Azure Communication Services and Azure OpenAI
Many applications offer chat with automated capabilities but lack the depth to fully understand and address user needs. What if a chat app could not only connect people but also improve conversations with AI insights? Imagine detecting customer sentiment, bringing in experts as needed, and supporting global customers with real-time language translation. These aren’t hypothetical AI features, but ways you can enhance your chat apps using Azure Communication Services and Azure OpenAI today. In this blog post, we guide you through a quickstart available on GitHub for you to clone and try on your own. We highlight key features and functions, making it easy to follow along. Learn how to upgrade basic chat functionality using AI to analyze user sentiment, summarize conversations, and translate messages in real-time. Natural Language Processing for Chat Messages First, let’s go through the key features of this project. Chat Management: The Azure Communication Services Chat SDK enables you to manage chat threads and messages, including adding and removing participants in addition to sending messages. AI Integration: Use Azure OpenAI GPT models to perform: Sentiment Analysis: Determine if user chat messages are positive, negative, or neutral. Summarization: Get a summary of chat threads to understand the key points of a conversation. Translation: Translate into different languages. RESTful endpoints: Easily integrate these AI capabilities and chat management through RESTful endpoints. Event Handling (optional): Use Azure Event Grid to handle chat message events and trigger the AI processing. The starter code for the quickstart is designed to get you up and running quickly. After entering your Azure Communication Services and OpenAI credentials in the config file and running a few commands in your terminal, you can observe the features listed above in action. There are two main components to this example. The first is the ChatClient, which manages the capturing and sending of messages, via a basic chat application using Azure Communication Services. The second component, OpenAIClient, enhances your chat application by transmitting messages to Azure OpenAI along with instructions for the desired types of AI analysis. AI Analysis with OpenAIClient Azure OpenAI can perform a multitude of AI analyses, but this quickstart focuses on summarizing, sentiment analysis, and translation. To achieve this, we created three distinct prompts for each of the AI analysis we want to perform on our chat messages. These system prompts serve as the instructions for how Azure OpenAI should process the user messages. To summarize a message, we hard-coded a system prompt that says, “Act like you are an agent specialized in generating summary of a chat conversation, you will be provided with a JSON list of messages of a conversation, generate a summary for the conversation based on the content message.” Like the best LLM prompts, it’s clear, specific, and provides context for the inputs it will get. The system prompts for translating and sentiment analysis follow a similar pattern. The quickstart provides the basic architecture that enables you to take the chat content and pass it to Azure OpenAI for analysis. nt analysis, translations, and summarization. The Core Function: getChatCompletions The getChatCompletions function is a pivotal part of the AI chat sample project. It processes user messages from a chat application, sends them to the OpenAI service for analysis, and returns the AI-generated responses. Here’s a detailed breakdown of how it works: Parameters The getChatCompletions function takes in two required parameters: systemPrompt: A string that provides instructions or context to the AI model. This helps guide OpenAI to generate appropriate and relevant responses. userPrompt: A string that contains the actual message from the user. This is what the AI model analyzes and responds to. Deployment Name: The getChatCompletions function starts by retrieving the deployment name for the OpenAI model from the environment variables. Message Preparation: The function formats and prepares messages to send to OpenAI. This includes the system prompt with instructions for the AI model and user prompts that contain the actual chat messages. Sending to OpenAI: The function sends these prepared messages to the OpenAI service using the openAiClient’s getChatCompletions method. This method interacts with the OpenAI model to generate a response based on the provided prompts. Processing the Response: The function receives the response from OpenAI, extracts the AI-generated content, logs it, and returns it for further use. Explore and Customize the Quickstart The goal of the quickstart is to demonstrate how to connect a chat application and Azure OpenAI, then expand on the capabilities. To run this project locally, make sure you meet the prerequisites and follow the instructions in the GitHub repository. The system prompts and user messages are provided as samples for you experiment with. The sample chat interaction is quite pleasant. Feel free to play around with the system prompts and change the sample messages between fictional Bob and Alice in client.ts to something more hostile and see how the analysis changes. Below is an example of changing the sample messages and running the project again. Real-time messages For your chat application, you should analyze messages in real-time. This demo is designed to simulate that workflow for ease of setup, with messages sent through your local demo server. However, the GitHub repository for this quickstart project provides instructions for implementing this in your actual application. To analyze real-time messages, you can use Azure Event Grid to capture any messages sent to your Azure Communication Resource along with the necessary chat data. From there, you trigger the function that calls Azure OpenAI with the appropriate context and system prompts for the desired analysis. More information about setting up this workflow is available with "optional" tags in the quickstart's README on GitHub. Conclusion Integrating Azure Communication Services with Azure OpenAI enables you to enhance your chat applications with AI analysis and insights. This guide helps you set up a demo that shows sentiment analysis, translation, and summarization, improving user interactions and engagement. To dive deeper into the code, check out the Natural Language Processing of Chat Messages repository, and build your own AI-powered chat application today!
seankeegan
Dec 20, 2024 Place Azure Communication Services Blog
1.3KViews
1like
0Comments