sample
9 Topics10 Things You Might Not Know You Could Do with Azure Communication Services
Azure Communication Services gives developers the building blocks for voice, video, chat, SMS, and more. But the real magic happens when you start combining those capabilities with other Azure services to solve real-world problems. This blog isn’t a feature list or a product pitch. It’s a collection of creative, practical scenarios that show what’s possible with Azure Communication Services today. Each one is based on real questions, real demos, and real developer experiences. Some are simple. Some are surprisingly powerful. All of them are designed to spark ideas. We’ve included links to sample code, documentation, and visuals to help you dive deeper. And we’ll keep this post updated as new scenarios emerge, so if you’ve built something cool, let us know! Build a Voice Assistant That Understands Users—and Follows Through 🔎 Quick Look What it does: Create a voice-first assistant that can understand, respond, and follow up using natural language. Why it matters: Offers a more intelligent, flexible alternative to traditional IVRs. What you'll need: Azure Communication Services for voice, Azure OpenAI, and backend logic to handle actions. Most voice agents are limited to scripted menus or keyword matching. But with Azure Communication Services and Azure OpenAI, you can build a voice experience that actually understands what users are saying and responds with meaningful action. In this demo, a user calls a virtual assistant looking for dinner inspiration. Instead of navigating a rigid menu, they just talk naturally. The assistant interprets the request, asks follow-up questions, and sends a personalized recipe link via SMS—all powered by Azure Communication Services for both the voice and messaging workflows. This kind of voice-first interaction is ideal for customer support, concierge services, or any scenario where users want to speak naturally and get something done. Watch the video below to see the full experience in action or explore the demo yourself here. Send Responsive Messages in Real-Time 🔎 Quick Look What it does: Trigger personalized messages based on real-time user behavior (like missed appointments or failed logins). Why it matters: Helps you move beyond static reminders to more timely, relevant communication. What you’ll need: Azure Communication Services, Azure Event Grid, Azure OpenAI, and an event source like a Logic App or backend service. Most messaging systems are built around schedules: send a reminder at 9 AM, a follow-up two days later, and so on. But what if your messages could respond to what your users are doing right now? With Azure Communication Services, you can build event-driven workflows that trigger messages based on real-time behavior. A customer misses an appointment. A user completes a transaction. A login attempt fails. Using Azure Event Grid, you can detect these events, generate a tailored message with Azure OpenAI, and send it instantly via SMS, email, or WhatsApp using Azure Communication Servies. This approach helps teams move beyond static, one-size-fits-all messaging. It enables timely, relevant communication that’s easier to maintain and scale - without manually scripting every variation. Learn more and get started: Azure Communication Services as an Event Grid source Handle SMS events with Event Grid Push notifications overview Use Event Grid to send calling push notifications Let Users Schedule Appointments by Text – In Their Own Words 🔎 Quick Look What it does: Enable natural language scheduling over SMS. No apps, menus, or portals required. Why it matters: Makes scheduling faster and more user-friendly, especially for service-based businesses. What you’ll need: Azure Communication Services for SMS, Azure OpenAI to interpret intent, and a backend or Logic App to manage availability and confirmations. Coordinating appointments over email or phone is slow and manual. Even traditional SMS-based schedulers often rely on rigid decision trees that break when users type something unexpected. This demo takes a smarter approach. By combining Azure Communication Services with Azure OpenAI, it lets users book, confirm, or reschedule appointments through natural, conversational SMS - no app, no portal, no menus. Just text like you normally would: “Hey, can I move my appointment to next Tuesday?” “Do you have anything earlier in the day?” Behind the scenes, Azure Communication Services handles the messaging layer, while OpenAI interprets the user’s intent and routes it to backend logic that manages availability and confirmations. It’s a lightweight, flexible solution that’s ideal for clinics, service providers, or any business that wants to streamline scheduling—without sacrificing user experience. Try the SMS scheduling demo. Everything you need to get started is in the README. Reach Customers on WhatsApp – Right Alongside SMS & Email 🔎 Quick Look What it does: Send messages across WhatsApp, SMS, and email from a single workflow. Why it matters: Increases engagement by meeting users where they are. What you’ll need: Azure Communication Services with Advanced Messaging SDK, verified sender setup for each channel Your customers are already on WhatsApp. Now your app can be too, without rearchitecting your entire messaging stack. Azure Communication Services lets you send and receive WhatsApp messages using the same platform you already use for SMS, email, and chat. That means you can reuse your existing workflows, backend logic, and delivery infrastructure - just with a new channel that meets your users where they are. Whether it’s appointment reminders, shipping updates, or live customer support, WhatsApp becomes just another part of your communication toolkit. You can trigger messages using Azure Event Grid, automate replies with Azure Bot Framework, and manage everything through the Advanced Messaging SDK. Want to see it in action? This quickstart guide walks you through registering your WhatsApp Business Account, connecting it to Azure Communication Services, and sending both text and media messages. > Channels selected from the blade menu. Learn more: Overview of Advanced Messaging for WhatsApp Send text and media WhatsApp messages (Quickstart) Publish an agent to WhatsApp using Copilot Studio Let Customers Join a Teams Meeting- Without a Teams Account 🔎 Quick Look What it does: Embed a browser-based Teams meeting experience into your app or site. Why it matters: Makes it easy for customers to join secure meetings without downloading Teams or signing in. What you’ll need: Azure Communication Services with Teams interop, a Teams meeting link, and a web app or portal. Not every customer wants to download an app or create a Microsoft account just to join a meeting. With Azure Communication Services, you can embed a fully branded, browser-based meeting experience into your app or website that connects directly to a Microsoft Teams meeting - no Teams account required. This is especially useful for industries like healthcare, legal, or financial services, where external participants need to join secure consultations or appointments without friction. You control the UI, the branding, and the flow, while Azure Communication Services handles the real-time voice and video connection to Teams. You can see how this works in the interop-quickstart demo, which shows how to create a Teams meeting, generate a join link, and embed the experience in a custom app. Handle Teams Calls Inside Your CRM—No App Switch Required 🔎 Quick Look What it does: Let agents make and receive Teams calls directly inside Dynamics 365 or a custom contact center UI. Why it matters: Reduces context switching and improves agent efficiency. What you’ll need: Teams Phone Extensibility, Azure Communication Services Call Automation, Dynamics 365 or another CCaaS. Most contact center agents juggle multiple tools - CRM, phone, notes, AI assistants - just to handle a single call. But what if they could do it all in one place? With Teams Phone Extensibility, powered by Azure Communication Services, agents can make and receive Teams calls directly inside Dynamics 365 or any custom contact center app. No need to open the Teams client. Here’s what’s possible: Answer calls in a custom agent desktop, routed through Teams Phone. Trigger AI workflows mid-call—like summarizing the conversation with Azure OpenAI or escalating to a supervisor. Initiate outbound calls from bots or workflows using ACS’s Call Automation APIs. Record and analyze calls with full control over logic and storage. It’s a surprising way to bring AI, voice, and CRM together, without rebuilding your contact center from scratch. Embed Secure Video Visits to Your Healthcare App–Fast 🔎 Quick Look What it does: Add HIPAA-compliant video calling with identity integration. Why it matters: Enables secure, branded telehealth or consultation experiences. What you’ll need: Azure Communication Services for video, Azure AD B2C, and a secure frontend. Telehealth is here to stay. But building a secure, compliant video experience from scratch can be a heavy lift. Azure Communication Services makes it easier. With built-in support for HIPAA, GDPR, and SOC 2, encrypted media transport, and identity integration via Azure AD B2C, Azure Communication Services lets you embed video calling directly into your app—without compromising on privacy or user experience. The Sample Builder shows how to combine video, chat, and SMS into a seamless patient-provider experience. It’s ready to deploy, customize, and scale. Learn more: Azure Communication Services HIPAA compliance overview Quickstart - Add video calling to your app - An Azure Communication Services quickstart | Microsoft Learn Combine AI and Human Support in a Single Chat Experience 🔎 Quick Look What it does: Start with an AI assistant and escalate to a human agent with full context. Why it matters: Scales support while preserving the human touch when needed. What you’ll need: Azure Communication Services for chat, Azure OpenAI, bot framework, and agent handoff logic. Most customer service chats start with automation—but they shouldn’t get stuck there. With Azure Communication Services, you can build a chat experience that begins with an AI assistant and hands off to a human agent when it makes sense. This demo shows how it works: a customer starts chatting through a web widget. An AI assistant, powered by Azure OpenAI, handles common questions and tasks. If the conversation gets complex or the user asks for help, the chat transitions smoothly to a live agent—no context lost. Agents can even generate AI-powered summaries to get up to speed quickly before jumping in. It’s a practical way to scale support without sacrificing the human touch. . On the left, a dialog box displays the user experience, while on the right, the agent's view shows the conversation summary and includes a button to take over the automated chat. Build a voice-first, AI virtual assistant in Under a Week 🔎 Quick Look What it does: Launch a branded voice assistant quickly using Zammo.ai and ACS. Why it matters: Speeds up deployment of voice experiences across channels. What you’ll need: Zammo.ai, Azure Communication Services for voice, and a publishing channel (e.g., Alexa, web). When Montgomery County, Maryland needed to support COVID-19 vaccine registration, they didn’t have months to build a solution. In just six business days, they launched a voice-first virtual assistant that handled 100% of inbound calls: automating appointment scheduling, supporting English and Spanish, and deflecting thousands of calls from live agents. They partnered with Zammo.ai to build the experience, all without writing custom code. Where Azure Communication Services fits in: Azure Communication Services powered the voice infrastructure, enabling a scalable, multilingual experience that saved time, reduced hold times by 90%, and helped the county serve residents more equitably. Don’t take our word for it, learn more about how it came together here. Know What You’ll Pay, Before You Ship 🔎 Quick Look What it does: Estimate costs and usage before you build. Why it matters: Helps you plan and budget more effectively. What you’ll need: Azure Communication Services pricing calculator, usage estimator, and billing dashboard. One of the first questions developers ask when building with Azure Communication Services is: “How much is this going to cost me?” And the answer is: it depends, but in a good way. Azure Communication Services uses a flexible, pay-as-you-go pricing model. You’re only billed for what you use - no upfront commitments, no recurring subscription fees. That makes it easy to prototype, test, and scale without overcommitting. Each communication channel (SMS, email, voice/video calling, and WhatsApp) has its own pricing structure based on usage volume, geography, and delivery method. For example: SMS to U.S. numbers is priced differently than international messages. Voice calls vary depending on whether you’re using VoIP, PSTN, or Teams interop. WhatsApp pricing may involve partner-based rates through Messaging Connect. There are a few exceptions to the pay-as-you-go model. For instance, leasing a dedicated phone number incurs a monthly fee. But overall, the model is transparent and developer-friendly. To help you estimate costs and plan ahead, here are some helpful resources: Azure portal pricing calculator: Azure Communication Services pricing | Microsoft Azure Azure Communication Services Pricing Overview: Azure Communication Services pricing | Microsoft Azure What Will You Build Next? Azure Communication Services gives you the flexibility to build the communication experience your users actually want - whether that’s a quick SMS, a secure video call, or a voice assistant that gets things done. And when you combine ACS with other Azure services like OpenAI, Event Grid, and Bot Framework, the possibilities expand even further. We’ll keep this post updated as new scenarios and demos emerge. If you’ve built something interesting with ACS, we’d love to hear about it—and maybe even feature it in a future post. Check out our official documentation to get started today!Moving Email Traffic from Exchange to Azure Communication Services
This article describes the important steps and considerations for transitioning from Exchange Online or other on-premises solutions to the Azure Communication Services email platform. It is also relevant for customers migrating from other mail solutions, whether on-premises or hybrid, to Azure Communication Services.Create next-gen voice agents with Azure AI's Voice Live API and Azure Communication Services
Today at Microsoft Build, we’re excited to announce the General Availability of bidirectional audio streaming for Azure Communication Services Call Automation SDK. Unveiling the power of speech-to-speech AI through Azure Communication Services! As previously seen at Microsoft Ignite in November 2024 the Call Automation bidirectional streaming APIs already work with services like Azure OpenAI to build conversational voice agents through speech to speech integrations. Now with General Availability release of Call Automation bidirectional streaming API and Azure AI Speech Services Voice Live API (Preview), creating voice agents has never been easier. Imagine AI agents that deliver seamless, low-latency, and naturally fluent conversations, transforming the way businesses and customers interact. Bidirectional Streaming APIs allow customers to stream audio from ongoing calls to their webserver in near real-time, where their voice enabled Large Language Models (LLMs) can ingest the audio to reason over and provide voice responses to stream back into the call. In this release we have added support for extra security by adding JSON Web Token (JWT) based authentication for the websocket connection allowing developers to make sure they’re creating secure solutions. As industries like customer service, education, HR, gaming, and public services see a surge in demand for generative AI voice chatbots, businesses are seeking real-time, natural-sounding voice interactions with the latest and greatest GenAI models. Integrating Azure Communication Services with the new Voice Live API from Azure AI Speech Services provides a low-latency interface that facilitates streaming speech input and output with Azure AI Speech’s advanced audio and voice capabilities. It supports multiple languages, diverse voices, and customization, and can even integrate with avatars for enhanced engagement. On the server side, powerful language models interpret the caller's queries and stream human-like responses back in real time, ensuring fluid and engaging conversations. By integrating these two technologies customers can create new innovative solutions for: Multilingual agents Develop virtual customer service representatives capable of of having conversations with end customers in their preferred language, allowing customers creating solutions for multilingual regions to create one solution to serve multiple languages and regions. Noise suppression and echo cancellation For AI voice agents to be effective they need clear audio to understand what the user is requesting, in order to improve AI efficiency, you can use out of the box noise suppression and echo cancellation built into the Voice Live API to help provide your AI agent the best quality audio to be able to clearly understand the end users requests and assist them. Support for branded voices Build voice agents that stay on brand with custom voices that represent your brand in any interaction with the customer, use Azure AI Speech services to create custom voice models that represent your brand and provide familiarity for your customers. How to Integrate Azure Communication Services with Azure AI Speech Service Voice Live API Language support With the integration to Voice Live API, you can now create solutions for over 150+ locales for speech input and output with 600+ realistic voices out of the box. I if these voices don’t suit your needs, customers can take this one step further and create custom speech models for their brand. How to start bidirectional streaming var mediaStreamingOptions = new MediaStreamingOptions( new Uri(websocketUri), MediaStreamingContent.Audio, MediaStreamingAudioChannel.Mixed, startMediaStreaming: true ) { EnableBidirectional = true, AudioFormat = AudioFormat.Pcm24KMono }; How to connect to Voice Live API (Preview) string GetSessionUpdate() { var jsonObject = new { type = "session.update", session = new { turn_detection = new { type = "azure_semantic_vad", threshold = 0.3, prefix_padding_ms = 200, silence_duration_ms = 200, remove_filler_words = false }, input_audio_noise_reduction = new { type = "azure_deep_noise_suppression" }, input_audio_echo_cancellation = new { type = "server_echo_cancellation" }, voice = new { name = "en-US-Aria:DragonHDLatestNeural", type = "azure-standard", temperature = 0.8 } } }; Next Steps The SDK and documentation along will be available in the next few weeks following this announcement, allowing you to build your own solutions using Azure Communication Services and Azure AI Voice Live API. You can download our latest sample from GitHub to try this for yourself. To learn more about the Voice Live API and all its different capabilities, see Azure AI Blog.Bringing AI to Meetings with the Sample Builder
We’re excited to share a significant update to the Azure Communication Services Sample Builder. This release integrates Azure’s latest AI and video calling capabilities, implementing meeting transcription and AI-generated call summaries to help organizations deliver insightful and effective meeting experiences. In just a few minutes, without the need for any coding, you can use the Sample Builder to start prototyping video calling with Azure AI integration. Click the link below to begin or continue reading for further information. 👉 Try the Sample Builder Note, this pattern of combining Azure Communication Services and Azure AI for meeting transcription and summarization is not limited to Sample Builder. You can take the code and overall design pattern and rebuild this experience using the underlying APIs and SDKs. What Is the Sample Builder? The Sample Builder is a no-code Azure Portal experience that you use to brand, customize, build, and deploy a GitHub based sample for prototyping. The sample integrates and deploys multiple Azure services for secure and engaging meetings: Application hosting of the meeting front-end is provided by Azure App Services High-definition video calling for mobile and desktop browsers is provided by Azure Communication Services Calling Role-based access for attendees and providers are implemented using Azure Communication Services Rooms Accessible, customizable, fluid user experience is built on the open-source Azure Communication Services UI library Designed for developers, IT teams, and solution architects, the Sample Builder gets you started quickly but doesn’t produce a production-ready application. After prototyping, you can take the code from GitHub, customize user experience, integrate your own systems, and fine-tune the AI interactions for production. Smarter Meetings with Transcription and Summarization Today’s update integrates Azure AI Speech and Azure AI Language services directly into your meetings, transforming how companies capture, understand, and act on conversations. You can fine-tune this integration and can take advantage of the latest innovation from Azure AI, ensuring your end-users benefit from advancements in LLM models, natural language understanding, and conversation summarization. Transcription and meeting summarization are valuable across industries. For example: Healthcare: Automatically document patient-provider interactions, reduce administrative burden, and support clinical accuracy. Financial Services: Capture detailed records of client meetings to meet regulatory requirements and improve transparency. Education: Provide students and instructors with accessible records of virtual sessions, supporting learning and retention. Real-Time Transcription With transcription enabled, Azure Communication Services uses Azure AI Speech to Text to convert spoken language into a live, speaker-attributed transcript. This allows participants to stay fully engaged in the conversation without worrying about taking notes. Key benefits include: Accurate, multilingual transcription for a wide variety of languages. You can see the full list of supported languages here Speaker attribution for clarity and accountability Searchable meeting records for easy reference and knowledge sharing Support for multilingual teams, with transcripts that can be translated or reviewed post-meeting Training and quality assurance, enabling review of real conversations to improve service delivery Transcripts can be stored securely and managed according to your organization’s compliance, privacy, and retention policies, making them suitable for regulated industries. AI-Generated Call Summaries After the meeting, the Azure AI Language Summarization API automatically analyzes the transcript and generates a concise, structured summary. This summary distills the conversation into key takeaways, including: Main discussion points Decisions made Action items and next steps This helps teams: Align quickly on outcomes and responsibilities Brief stakeholders who couldn’t attend Maintain consistent documentation for compliance, audits, or internal reporting Reduce meeting fatigue by eliminating the need to rewatch or reread entire transcripts How to Get Started You can try these new features today by following a few simple steps: Go through the Sample Builder using the official tutorial. Select the Rooms option in the booking and calling steps. Enable auto-transcription or allow users to turn on transcription in the booking and calling steps. Enable meeting summary in the post-call steps. Choose how you want to deploy and go through follow up steps. Once fully deployed, start a call to test. If not using auto-transcription, open the meeting controls and select “Start Transcription”. Choose the spoken language (Click here for the list of supported languages). After the meeting ends, participants can view the AI-generated summary and download the transcript. In the sample experience, transcripts and summaries are available temporarily. In production environments, you can store them securely and use them to support training, compliance, or analytics. Microsoft 365 Integration Today’s update focuses on integrating Azure AI with Azure hosted meetings. However Azure Communication Services are interoperable with Microsoft Teams, and you can use the Sample Builder to deploy a branded Azure application that joins Teams meetings. Interoperability can be incredibly helpful for organizations that are already using Microsoft Teams but want to build a custom meeting experience for business-2-consumer (B2C) interactions. Using Microsoft Teams as the meeting host allows you to leverage: Teams Premium AI features for generating meeting notes and recommending follow-ups. Teams Premium virtual appointment features for scheduling of B2C meetings and sending reminders across SMS, email, and other channels. Teams Phone capabilities so end-users can dial into the meeting using traditional telephony. Get Started Today Explore the new AI-powered features in the Sample Builder and start building smarter virtual appointment experiences: 👉 Try the Sample Builder With transcription and meeting summaries, your meetings can do more than connect people—they can capture insights, drive action, and deliver lasting value.345Views0likes0CommentsCatch Up on the Azure Communication Services Fundamentals Series
This April, we partnered with Microsoft Reactor to deliver a four-part webcast series designed to help developers get started with Azure Communication Services. Each 20-minute episode focused on a different communication channel—giving developers the tools to build real-time, scalable, and secure communication experiences into their apps. If you missed the live sessions, don’t worry, they’re all available on-demand! Check out the full playlist here or see the following individual videos. Here’s a quick look at what each episode covered: Episode 1: WhatsApp Messaging We kicked off the series by showing how to integrate WhatsApp Business messaging into your Azure Communication Services applications. We walked through everything from sandbox testing to connecting a verified WhatsApp Business Account and sending messages with SDKs. For a deeper dive on what was covered, read about it here from Gloria herself! Episode 2: Exploring SMS Capabilities This session described how to provision a phone number, verify the number, and how to send/receive SMS messages. We also covered how to handle incoming messages with event grid listeners and code-based handlers. For a deeper dive on exactly what was covered, read about it here from Pranita herself! Episode 3: Maximizing Email Insights with Logs and Events Next, we dove into email analytics and telemetry setting up logs and events, understanding sender reputation, and using sample queries to gain insights. From basic sample queries to advanced Kusto Query Language (KQL) queries, this session covered everything you need to run a successful email marketing campaign with Azure Communication Services. Episode 4: Add Audio & Video Calling We wrapped the season with a demo-rich session on embedding calling features into your communications application using Azure Communication Services. Highlights included new AI-powered features like captions, noise suppression, grid views, and real-time translation. What’s Next? We’re already planning Season 2, launching later this year, with a focus on Azure Communication Services + AI. Expect deeper dives, new use cases, and more interactive demos. Want to stay in the loop? Sign up for season two updates to be the first to know when the new season launches and tell us what you want to learn about!Azure Communication Services technical documentation table of contents update
Technical documentation is like a map for using a platform—whether you're building services, solving problems, or learning new features, great documentation shows you the way to the solution you need. But what good is a map if it’s hard to read or confusing to follow? That’s why easy-to-navigate documentation is so important. It saves time, reduces frustration, and helps users focus on what they want to achieve. Azure Communication Services is a powerful platform, and powerful platforms require great documentation for both new and experienced developers. Our customers tell us consistently that our docs are a crucial part of their experience of using our platform. Some studies suggest that documentation and samples are the most important elements of a great developer experience. In this update, we’re excited to share how we’ve improved our technical documentation’s navigation to make it quicker and simpler than ever to find the information you need when you need it. Why did we change? In order for our content to be useful to you, it first needs to be findable. When we launched Azure Communication Services, the small number of articles on our site made it easy to navigate and find relevant content. As we’ve grown, though, our content became harder to find for users due to the quantity of articles they need to navigate. To refresh your memory, the table of contents on our docs site used to be structured with these base categories: Overview Quickstart Tutorials Samples Concepts Resources References These directory names describ e the type of content they contain. This structure is a very useful model for products with a clearly-defined set of use cases, where typically a customer’s job-to-be-done is more constrained, but it breaks down when used for complex, powerful platforms that support a broad range of use cases in the way that Azure Communication Services does. We tried a number of small-scale changes to address the problems people were having on our site, such as having certain directories default to open on page load, but as the site grew, we became concerned that our site navigation model was becoming confusing to users and having a negative impact on their experience with our product. We decided to test that hypothesis and consider different structures that might serve our content and our customers better. Our user research team interviewed 18 customers with varying levels of experience on our platform. The research uncovered several problems that customers were having with the way our docs navigation was structured. From confusing folder titles, to related topics being far away from each other in the nav model, to general confusion around what folder titles meant, to problems finding some of the most basic information about using our platform, and a host of other issues, our user research made it clear to us that we had a problem that we needed to fix for our users. What did we change in this release? To help address these issues, we made a few key changes to make our table of contents simpler and easier to navigate. The changes we made were strictly to site navigation, not page content, and they include: We've restructured the root-level navigation to be focused on communication modality and feature type, rather than content type, to better model our customers' jobs-to-be-done. Topics include All supported communication channels Horizontal features that span more than one channel Topics of special interest to our customers, like AI Basic needs, like troubleshooting and support This will allow customers to more easily find the content they need by focusing on the job they need to do, rather than on the content type. We've simplified the overview and fundamentals sections to make the site less overwhelming on first load. We've surfaced features that customers told us were difficult to find, such as UI Library, Teams interop, and Job router. We've organized the content within each directory to roughly follow a beginner->expert path to make content more linear, and to make it easier for a user to find the next step in completing their task. We've removed unnecessary layers in our nav, making content easier to find. We've added a link to pricing information to each primitive to address a common customer complaint, that pricing information is difficult to find and understand. We've combined quickstarts, samples, and tutorials into one directory per primitive, called "Samples and tutorials", to address a customer complaint that our category names were confusing. We added a directory to each primitive for Resources, to keep important information close by. We added root-level directories for Common Scenarios, Troubleshooting, and Help and support. We did a full pass across all TOC entries to ensure correct casing, and edited entries for readability and consistency with page content, as well as for length to adhere to Microsoft guidelines and improve readability. These changes have led us to a structure that we feel less taxing for the reader, especially on first visit, maps more closely to the customer’s mental model of the information by focusing on the job-to-be-done rather than content type, helps lead them through the content from easiest to hardest, helps make it easier for them to find the information they need when they need it, and helps remind them of all the different features we support. Here’s what the table of contents looks like on page load as of Feb 6: These changes are live now. You can see them on the Azure Communication Services Technical documentation site. What’s next: In the coming weeks we will continue to make refinements based on customer feedback and our assessment of usage metrics. Our content team will begin updating article content to improve readability and enhance learning. We will be monitoring our changes and seeking your feedback. How will we monitor the effectiveness of our changes? To track the effectiveness of our changes and to be sure we haven’t regressed, we’ll be tracking a few key metrics Bounce rates: We’ll be on the lookout for an increase in bounce rates, which would indicate that customers are frequently landing on pages that don’t meet their expectations. Page Views: We’ll be tracking the number of page views for our most-visited pages across different features. A decrease in page views for these pages will be an indicator that customers are not able to find pages that had previously been popular. Customer Interviews: We will be reaching out to some of you to get your impressions of the new structure of our content over the coming weeks. Customer Surveys: We've created a survey that you can use to give us your feedback. We'll also be adding this link to select pages to allow you to tell us what you think of our changes while you're using them! So, give our new site navigation a try, and please don’t hesitate to share your feedback either by filling out our survey or by sending an email to acs-docs-feedback@microsoft.com. We look forward to hearing from you! A2.3KViews2likes2CommentsAI-Powered Chat with Azure Communication Services and Azure OpenAI
Many applications offer chat with automated capabilities but lack the depth to fully understand and address user needs. What if a chat app could not only connect people but also improve conversations with AI insights? Imagine detecting customer sentiment, bringing in experts as needed, and supporting global customers with real-time language translation. These aren’t hypothetical AI features, but ways you can enhance your chat apps using Azure Communication Services and Azure OpenAI today. In this blog post, we guide you through a quickstart available on GitHub for you to clone and try on your own. We highlight key features and functions, making it easy to follow along. Learn how to upgrade basic chat functionality using AI to analyze user sentiment, summarize conversations, and translate messages in real-time. Natural Language Processing for Chat Messages First, let’s go through the key features of this project. Chat Management: The Azure Communication Services Chat SDK enables you to manage chat threads and messages, including adding and removing participants in addition to sending messages. AI Integration: Use Azure OpenAI GPT models to perform: Sentiment Analysis: Determine if user chat messages are positive, negative, or neutral. Summarization: Get a summary of chat threads to understand the key points of a conversation. Translation: Translate into different languages. RESTful endpoints: Easily integrate these AI capabilities and chat management through RESTful endpoints. Event Handling (optional): Use Azure Event Grid to handle chat message events and trigger the AI processing. The starter code for the quickstart is designed to get you up and running quickly. After entering your Azure Communication Services and OpenAI credentials in the config file and running a few commands in your terminal, you can observe the features listed above in action. There are two main components to this example. The first is the ChatClient, which manages the capturing and sending of messages, via a basic chat application using Azure Communication Services. The second component, OpenAIClient, enhances your chat application by transmitting messages to Azure OpenAI along with instructions for the desired types of AI analysis. AI Analysis with OpenAIClient Azure OpenAI can perform a multitude of AI analyses, but this quickstart focuses on summarizing, sentiment analysis, and translation. To achieve this, we created three distinct prompts for each of the AI analysis we want to perform on our chat messages. These system prompts serve as the instructions for how Azure OpenAI should process the user messages. To summarize a message, we hard-coded a system prompt that says, “Act like you are an agent specialized in generating summary of a chat conversation, you will be provided with a JSON list of messages of a conversation, generate a summary for the conversation based on the content message.” Like the best LLM prompts, it’s clear, specific, and provides context for the inputs it will get. The system prompts for translating and sentiment analysis follow a similar pattern. The quickstart provides the basic architecture that enables you to take the chat content and pass it to Azure OpenAI for analysis. nt analysis, translations, and summarization. The Core Function: getChatCompletions The getChatCompletions function is a pivotal part of the AI chat sample project. It processes user messages from a chat application, sends them to the OpenAI service for analysis, and returns the AI-generated responses. Here’s a detailed breakdown of how it works: Parameters The getChatCompletions function takes in two required parameters: systemPrompt: A string that provides instructions or context to the AI model. This helps guide OpenAI to generate appropriate and relevant responses. userPrompt: A string that contains the actual message from the user. This is what the AI model analyzes and responds to. Deployment Name: The getChatCompletions function starts by retrieving the deployment name for the OpenAI model from the environment variables. Message Preparation: The function formats and prepares messages to send to OpenAI. This includes the system prompt with instructions for the AI model and user prompts that contain the actual chat messages. Sending to OpenAI: The function sends these prepared messages to the OpenAI service using the openAiClient’s getChatCompletions method. This method interacts with the OpenAI model to generate a response based on the provided prompts. Processing the Response: The function receives the response from OpenAI, extracts the AI-generated content, logs it, and returns it for further use. Explore and Customize the Quickstart The goal of the quickstart is to demonstrate how to connect a chat application and Azure OpenAI, then expand on the capabilities. To run this project locally, make sure you meet the prerequisites and follow the instructions in the GitHub repository. The system prompts and user messages are provided as samples for you experiment with. The sample chat interaction is quite pleasant. Feel free to play around with the system prompts and change the sample messages between fictional Bob and Alice in client.ts to something more hostile and see how the analysis changes. Below is an example of changing the sample messages and running the project again. Real-time messages For your chat application, you should analyze messages in real-time. This demo is designed to simulate that workflow for ease of setup, with messages sent through your local demo server. However, the GitHub repository for this quickstart project provides instructions for implementing this in your actual application. To analyze real-time messages, you can use Azure Event Grid to capture any messages sent to your Azure Communication Resource along with the necessary chat data. From there, you trigger the function that calls Azure OpenAI with the appropriate context and system prompts for the desired analysis. More information about setting up this workflow is available with "optional" tags in the quickstart's README on GitHub. Conclusion Integrating Azure Communication Services with Azure OpenAI enables you to enhance your chat applications with AI analysis and insights. This guide helps you set up a demo that shows sentiment analysis, translation, and summarization, improving user interactions and engagement. To dive deeper into the code, check out the Natural Language Processing of Chat Messages repository, and build your own AI-powered chat application today!Build your own real-time voice agent - Announcing preview of bidirectional audio streaming APIs
We are pleased to announce the public preview of bidirectional audio streaming, enhancing the capabilities of voice based conversational AI. During Satya Nadella’s keynote at Ignite, Seth Juarez demonstrated a voice agent engaging in a live phone conversation with a customer. You can now create similar experiences using Azure Communication Services bidirectional audio streaming APIs and GPT 4o model. In our recent Ignite blog post, we announced the upcoming preview of our audio streaming APIs. Now that it is publicly available, this blog describes how to use the bidirectional audio streaming APIs available in Azure Communication Services Call Automation SDK to build low-latency voice agents powered by GPT 4o Realtime API. How does the bi-directional audio streaming API enhance the quality of voice-driven agent experiences? AI-powered agents facilitate seamless, human-like interactions and can engage with users through various channels such as chat or voice. In the context of voice communication, low latency in conversational responses is crucial as delays can cause users to perceive a lack of response and disrupt the flow of conversation. Gone are the days when building a voice bot required stitching together multiple models for transcription, inference, and text-to-speech conversion. Developers can now stream live audio from an ongoing call (VoIP or telephony) to their backend server logic using the bi-directional audio streaming APIs, leverage GPT 4o to process audio input, and deliver responses back with minimal latency for the caller/user. Building Your Own Real-Time Voice Agent In this section, we walk you through a QuickStart for using Call Automation’s audio streaming APIs for building a voice agent. Before you begin, ensure you have the following: Active Azure Subscription: Create an account for free. Azure Communication Resource: Create an Azure Communication Resource and record your resource connection string for later use. Azure Communication Services Phone Number: A calling-enabled phone number. You can buy a new phone number or use a free trial number. Azure Dev Tunnels CLI: For details, see Enable dev tunnel. Azure OpenAI Resource: Set up an Azure OpenAI resource by following the instructions in Create and deploy an Azure OpenAI Service resource. Azure OpenAI Service Model: To use this sample, you must have the GPT-4o-Realtime-Preview model deployed. Follow the instructions at GPT-4o Realtime API for speech and audio (Preview) to set it up. Development Environment: Familiarity with .NET and basic asynchronous programming. Clone the quick start sample application: You can find the quick start at Azure Communication Services Call Automation and Azure OpenAI Service. git clone https://github.com/Azure-Samples/communication-services-dotnet-quickstarts.git After completing the prerequisites, open the cloned project and follow these setup steps. Environment Setup Before running this sample, you need to set up the previously mentioned resources with the following configuration updates: Setup and host your Azure dev tunnel Azure Dev tunnels is an Azure service that enables you to expose locally hosted web services to the internet. Use the following commands to connect your local development environment to the public internet. This creates a tunnel with a persistent endpoint URL and enables anonymous access. We use this endpoint to notify your application of calling events from the Azure Communication Services Call Automation service. devtunnel create --allow-anonymous devtunnel port create -p 5165 devtunnel host 2. Navigate to the quick start CallAutomation_AzOpenAI_Voice from the project you cloned. 3. Add the required API keys and endpoints Open the appsettings.json file and add values for the following settings: DevTunnelUri: Your dev tunnel endpoint AcsConnectionString: Azure Communication Services resource connection string AzureOpenAIServiceKey: OpenAI Service Key AzureOpenAIServiceEndpoint: OpenAI Service Endpoint AzureOpenAIDeploymentModelName: OpenAI Model name Run the Application Ensure your AzureDevTunnel URI is active and points to the correct port of your localhost application. Run the command dotnet run to build and run the sample application. Register an Event Grid Webhook for the IncomingCall Event that points to your DevTunnel URI (https://<your-devtunnel-uri/api/incomingCall>). For more information, see Incoming call concepts. Test the app Once the application is running: Call your Azure Communication Services number: Dial the number set up in your Azure Communication Services resource. A voice agent answer, enabling you to converse naturally. View the transcription: See a live transcription in the console window. QuickStart Walkthrough Now that the app is running and testable, let’s explore the quick start code snippet and how to use the new APIs. Within the program.cs file, the endpoint /api/incomingCall, handles inbound calls. app.MapPost("/api/incomingCall", async ( [FromBody] EventGridEvent[] eventGridEvents, ILogger<Program> logger) => { foreach (var eventGridEvent in eventGridEvents) { Console.WriteLine($"Incoming Call event received."); // Handle system events if (eventGridEvent.TryGetSystemEventData(out object eventData)) { // Handle the subscription validation event. if (eventData is SubscriptionValidationEventData subscriptionValidationEventData) { var responseData = new SubscriptionValidationResponse { ValidationResponse = subscriptionValidationEventData.ValidationCode }; return Results.Ok(responseData); } } var jsonObject = Helper.GetJsonObject(eventGridEvent.Data); var callerId = Helper.GetCallerId(jsonObject); var incomingCallContext = Helper.GetIncomingCallContext(jsonObject); var callbackUri = new Uri(new Uri(appBaseUrl), $"/api/callbacks/{Guid.NewGuid()}?callerId={callerId}"); logger.LogInformation($"Callback Url: {callbackUri}"); var websocketUri = appBaseUrl.Replace("https", "wss") + "/ws"; logger.LogInformation($"WebSocket Url: {callbackUri}"); var mediaStreamingOptions = new MediaStreamingOptions( new Uri(websocketUri), MediaStreamingContent.Audio, MediaStreamingAudioChannel.Mixed, startMediaStreaming: true ) { EnableBidirectional = true, AudioFormat = AudioFormat.Pcm24KMono }; var options = new AnswerCallOptions(incomingCallContext, callbackUri) { MediaStreamingOptions = mediaStreamingOptions, }; AnswerCallResult answerCallResult = await client.AnswerCallAsync(options); logger.LogInformation($"Answered call for connection id: {answerCallResult.CallConnection.CallConnectionId}"); } return Results.Ok(); }); In the preceding code, MediaStreamingOptions encapsulates all the configurations for bidirectional streaming. WebSocketUri: We use the dev tunnel URI with the WebSocket protocol, appending the path /ws. This path manages the WebSocket messages. MediaStreamingContent: The current version of the API supports only audio. Audio Channel: Supported formats include: Mixed: Contains the combined audio streams of all participants on the call, flattened into one stream. Unmixed: Contains a single audio stream per participant per channel, with support for up to four channels for the most dominant speakers at any given time. You also get a participantRawID to identify the speaker. StartMediaStreaming: This flag, when set to true, enables the bidirectional stream automatically once the call is established. EnableBidirectional: This enables audio sending and receiving. By default, it only receives audio data from Azure Communication Services to your application. AudioFormat: This can be either 16k pulse code modulation (PCM) mono or 24k PCM mono. Once you configure all these settings, you need to pass them to AnswerCallOptions. Now that the call is established, let's dive into the part for handling WebSocket messages. This code snippet handles the audio data received over the WebSocket. The WebSocket's path is specified as /ws, which corresponds to the WebSocketUri provided in the configuration. app.Use(async (context, next) => { if (context.Request.Path == "/ws") { if (context.WebSockets.IsWebSocketRequest) { try { var webSocket = await context.WebSockets.AcceptWebSocketAsync(); var mediaService = new AcsMediaStreamingHandler(webSocket, builder.Configuration); // Set the single WebSocket connection await mediaService.ProcessWebSocketAsync(); } catch (Exception ex) { Console.WriteLine($"Exception received {ex}"); } } else { context.Response.StatusCode = StatusCodes.Status400BadRequest; } } else { await next(context); } }); The method await mediaService.ProcessWebSocketAsync() processesg all incoming messages. The method establishes a connection with OpenAI, initiates a conversation session, and waits for a response from OpenAI. This method ensures seamless communication between the application and OpenAI, enabling real-time audio data processing and interaction. // Method to receive messages from WebSocket public async Task ProcessWebSocketAsync() { if (m_webSocket == null) { return; } // Start forwarder to AI model m_aiServiceHandler = new AzureOpenAIService(this, m_configuration); try { m_aiServiceHandler.StartConversation(); await StartReceivingFromAcsMediaWebSocket(); } catch (Exception ex) { Console.WriteLine($"Exception -> {ex}"); } finally { m_aiServiceHandler.Close(); this.Close(); } } Once the application receives data from Azure Communication Services, it parses the incoming JSON payload to extract the audio data segment. The application then forwards the segment to OpenAI for further processing. The parsing ensures data integrity ibefore sending it to OpenAI for analysis. // Receive messages from WebSocket private async Task StartReceivingFromAcsMediaWebSocket() { if (m_webSocket == null) { return; } try { while (m_webSocket.State == WebSocketState.Open || m_webSocket.State == WebSocketState.Closed) { byte[] receiveBuffer = new byte; WebSocketReceiveResult receiveResult = await m_webSocket.ReceiveAsync(new ArraySegment(receiveBuffer), m_cts.Token); if (receiveResult.MessageType != WebSocketMessageType.Close) { string data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0'); await WriteToAzOpenAIServiceInputStream(data); } } } catch (Exception ex) { Console.WriteLine($"Exception -> {ex}"); } } Here is how the application parses and forwards the data segment to OpenAI using the established session: private async Task WriteToAzOpenAIServiceInputStream(string data) { var input = StreamingData.Parse(data); if (input is AudioData audioData) { using (var ms = new MemoryStream(audioData.Data)) { await m_aiServiceHandler.SendAudioToExternalAI(ms); } } } Once the application receives the response from OpenAI, it formats the data to be forwarded to Azure Communication Services and relays the response in the call. If the application detects voice activity while OpenAI is talking, it sends a barge-in message to Azure Communication Services to manage the voice playing in the call. // Loop and wait for the AI response private async Task GetOpenAiStreamResponseAsync() { try { await m_aiSession.StartResponseAsync(); await foreach (ConversationUpdate update in m_aiSession.ReceiveUpdatesAsync(m_cts.Token)) { if (update is ConversationSessionStartedUpdate sessionStartedUpdate) { Console.WriteLine($"<<< Session started. ID: {sessionStartedUpdate.SessionId}"); Console.WriteLine(); } if (update is ConversationInputSpeechStartedUpdate speechStartedUpdate) { Console.WriteLine($" -- Voice activity detection started at {speechStartedUpdate.AudioStartTime} ms"); // Barge-in, send stop audio var jsonString = OutStreamingData.GetStopAudioForOutbound(); await m_mediaStreaming.SendMessageAsync(jsonString); } if (update is ConversationInputSpeechFinishedUpdate speechFinishedUpdate) { Console.WriteLine($" -- Voice activity detection ended at {speechFinishedUpdate.AudioEndTime} ms"); } if (update is ConversationItemStreamingStartedUpdate itemStartedUpdate) { Console.WriteLine($" -- Begin streaming of new item"); } // Audio transcript updates contain the incremental text matching the generated output audio. if (update is ConversationItemStreamingAudioTranscriptionFinishedUpdate outputTranscriptDeltaUpdate) { Console.Write(outputTranscriptDeltaUpdate.Transcript); } // Audio delta updates contain the incremental binary audio data of the generated output audio // matching the output audio format configured for the session. if (update is ConversationItemStreamingPartDeltaUpdate deltaUpdate) { if (deltaUpdate.AudioBytes != null) { var jsonString = OutStreamingData.GetAudioDataForOutbound(deltaUpdate.AudioBytes.ToArray()); await m_mediaStreaming.SendMessageAsync(jsonString); } } if (update is ConversationItemStreamingTextFinishedUpdate itemFinishedUpdate) { Console.WriteLine(); Console.WriteLine($" -- Item streaming finished, response_id={itemFinishedUpdate.ResponseId}"); } if (update is ConversationInputTranscriptionFinishedUpdate transcriptionCompletedUpdate) { Console.WriteLine(); Console.WriteLine($" -- User audio transcript: {transcriptionCompletedUpdate.Transcript}"); Console.WriteLine(); } if (update is ConversationResponseFinishedUpdate turnFinishedUpdate) { Console.WriteLine($" -- Model turn generation finished. Status: {turnFinishedUpdate.Status}"); } if (update is ConversationErrorUpdate errorUpdate) { Console.WriteLine(); Console.WriteLine($"ERROR: {errorUpdate.Message}"); break; } } } catch (OperationCanceledException e) { Console.WriteLine($"{nameof(OperationCanceledException)} thrown with message: {e.Message}"); } catch (Exception ex) { Console.WriteLine($"Exception during AI streaming -> {ex}"); } } Once the data is prepared for Azure Communication Services, the application sends the data over the WebSocket: public async Task SendMessageAsync(string message) { if (m_webSocket?.State == WebSocketState.Open) { byte[] jsonBytes = Encoding.UTF8.GetBytes(message); // Send the PCM audio chunk over WebSocket await m_webSocket.SendAsync(new ArraySegment<byte>(jsonBytes), WebSocketMessageType.Text, endOfMessage: true, CancellationToken.None); } } This wraps up our QuickStart overview. We hope you create outstanding voice agents with the new audio streaming APIs. Happy coding! For more information about Azure Communication Services bidirectional audio streaming APIs , check out: GPT-4o Realtime API for speech and audio (Preview) Audio streaming overview - audio subscription Quickstart - Server-side Audio Streaming4.7KViews2likes0Comments